Some observations about false negatives

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Thu May 29 09:26:41 CEST 2003


Hi!

Bogofilter is really very good by now, but some spam still
comes through. For me this is in almost all cases spam in
German. There seems to be something special about this
language. Could be that way more words are used in average
language than in English. Of course, important here is that
all my ham is German or English, so any other language will
look like spam, I guess. Another reason might be that
grammatical forms change much more than in English. So for
every word you are likely to have quite a lot of different
forms.

But what really bugs me that even after training a lot of
those messages are still rated as ham. Looking at -vvv the
usual situation is that many many words show up at the good
and on the bad side. Looks like spammers cannot really avoid
using good words (closer look shows this is not some strange
wording to achieve this effect). But this is not enough to
make those words neutral.

pi





More information about the Bogofilter mailing list