Accuracy is lacking

David Relson relson at osagesoftware.com
Thu Feb 13 22:29:50 CET 2003


At 03:44 PM 2/13/03, Tracy R Reed wrote:

>I don't get any false positives but it is missing a lot of spam. I would
>say 3/4 of the spam makes it into my inbox and only 1/4 gets filtered.
>Looking at the spamicity measurement in the header they all fall very near
>to 0.5. With bayespam the values for spam were always very high and
>non-spam very low so there were very few edge cases. Here everything seems
>to fall right on the line and most of it ends up in my inbox. I always
>send the misclassified spam back through bogofilter to correct the
>database but it does not seem to be gaining me anything.

Hi Tracy,

Am I right to guess that you're using the default configuration for 
0.10.1.5, i.e. the robinson algorithm?  It might be valuable to switch to 
Robinson-Fisher (via "-f" on the command line or "algorithm=fisher" in the 
config file).  RF tends to polarize the results with spam shifted towards 
1.0 and ham towards 0.0.  Assuming you've seen my test result messages 
earlier today, you've seen that RF (in tristate mode) is classifying 95% or 
so of my incoming mail as ham or spam (with the remaining 5% being unsure).

>Normally mail that is flagged as spam is procmailed into my spam folder.
>Spam that ends up in my mailbox gets piped through bogofilter for
>correction and then saved in the spam folder. Here is all of the spam I
>manually saved into my spam folder this week that was misclassified:
>

... [snip] ...

>Not much difference in some cases. Any suggestions?

Unfortunately, one can't tell much from the scores.  I often find it of 
value to manually run bogofilter when I'm curious about the results it has 
generated.  With "-r" and "-f" you can get a histogram by also using "-vv" 
or you can get the full list of words and their scores with "-vvv".  The 
histogram gives a visual display that I find informative.

David





More information about the Bogofilter mailing list