Accuracy is lacking
David Relson
relson at osagesoftware.com
Thu Feb 13 22:29:50 CET 2003
At 03:44 PM 2/13/03, Tracy R Reed wrote:
>I don't get any false positives but it is missing a lot of spam. I would
>say 3/4 of the spam makes it into my inbox and only 1/4 gets filtered.
>Looking at the spamicity measurement in the header they all fall very near
>to 0.5. With bayespam the values for spam were always very high and
>non-spam very low so there were very few edge cases. Here everything seems
>to fall right on the line and most of it ends up in my inbox. I always
>send the misclassified spam back through bogofilter to correct the
>database but it does not seem to be gaining me anything.
Hi Tracy,
Am I right to guess that you're using the default configuration for
0.10.1.5, i.e. the robinson algorithm? It might be valuable to switch to
Robinson-Fisher (via "-f" on the command line or "algorithm=fisher" in the
config file). RF tends to polarize the results with spam shifted towards
1.0 and ham towards 0.0. Assuming you've seen my test result messages
earlier today, you've seen that RF (in tristate mode) is classifying 95% or
so of my incoming mail as ham or spam (with the remaining 5% being unsure).
>Normally mail that is flagged as spam is procmailed into my spam folder.
>Spam that ends up in my mailbox gets piped through bogofilter for
>correction and then saved in the spam folder. Here is all of the spam I
>manually saved into my spam folder this week that was misclassified:
>
... [snip] ...
>Not much difference in some cases. Any suggestions?
Unfortunately, one can't tell much from the scores. I often find it of
value to manually run bogofilter when I'm curious about the results it has
generated. With "-r" and "-f" you can get a histogram by also using "-vv"
or you can get the full list of words and their scores with "-vvv". The
histogram gives a visual display that I find informative.
David
More information about the Bogofilter
mailing list