Some observations about false negatives

David Relson relson at osagesoftware.com
Thu May 29 14:05:24 CEST 2003


At 03:26 AM 5/29/03, Boris 'pi' Piwinger wrote:
>Hi!
>
>Bogofilter is really very good by now, but some spam still
>comes through. For me this is in almost all cases spam in
>German. There seems to be something special about this
>language. Could be that way more words are used in average
>language than in English. Of course, important here is that
>all my ham is German or English, so any other language will
>look like spam, I guess. Another reason might be that
>grammatical forms change much more than in English. So for
>every word you are likely to have quite a lot of different
>forms.
>
>But what really bugs me that even after training a lot of
>those messages are still rated as ham. Looking at -vvv the
>usual situation is that many many words show up at the good
>and on the bad side. Looks like spammers cannot really avoid
>using good words (closer look shows this is not some strange
>wording to achieve this effect). But this is not enough to
>make those words neutral.
>
>pi

Hi pi,

Have you done any robs/min_dev tuning or are you still using the default 
values of robs and min_dev?  Greg and I have done a number of experiments 
with varying these values.  Bogofilter's default values are adequate, but 
probably not optimal for any particular site (since the optimal values vary 
according to the mix of messages _you_ receive).

It's easy to see the effect of varying min_dev.  Try the following:

for md in `seq 0.10 0.05 0.451` ; do
     echo -n $md ""
     bogofilter -v -m$md -d $BOGOFILTER_DIR < message
done

If you want to run a tuning experiment, the scripts are in 
/usr/share/bogofilter/tuning (for recent versions of bogofilter).

Cheers!

David





More information about the Bogofilter mailing list