FAQ update

David Relson relson at osagesoftware.com
Wed Feb 19 05:51:48 CET 2003


At 10:22 PM 2/18/03, Eric Hanchrow wrote:

> >>>>> "David" == David Relson <relson at osagesoftware.com> writes:
>
>     David> What other questions and answers have y'all asked that
>     David> should be included?  Please post them to the mailing list
>     David> so we can all contribute.
>
>I don't know if anyone but me has asked this question frequently, but
>I'd love to know: why is bogofilter only catching 85% of my spam, and
>not the 99% I'd hoped it would catch?

Hi Eric,

Welcome to the mailing list.  Did you have to ask such a tough question in 
your first message?  I can respond to, but I don't think I can fully answer it.

For bogofilter to be effective, it needs to be trained on what you consider 
spam and what you consider ham (good).  It saves that information in its 
wordlists.  When a new message arrives, bogofilter compares the words in it 
to the words in the wordlists and computes a score that indicates whether 
the message is spam or ham.

A newly arrived message may be like previous messages or it may 
not.  Bogofilter does what it can to measure the "likeness" and generate 
the proper spam score from that info.  Assuming the first case, then 
bogofilter should correctly classify it.  Assuming the second case, 
bogofilter will be unable to correctly classify.  One _could_ say if 
bogofilter is properly trained it will get the classification right and one 
_could_ say that improper training leads to improper 
classification.  Stated like this, the responsibility for classification is 
on the trainer's shoulders.  This line of thought is just a bit severe :-)

More realistically, it's probably not possible to fully train a spam filter 
so that it catches all spam.  Even if we could train bogofilter to catch 
all spam in styles of the past, spammers are always developing new tricks 
and techniques - with the explicit goal of getting past the 
filters.  Whatever we can catch today, there's always a new form tomorrow 
that might be missed.  The best way I know to handle it is to appreciate 
the 85% that is caught and use the other 15% to train bogofilter so that 
next week (or month) it can catch even more.

Will bogofilter ever reach 99%?  For some people the answer is yes.  I 
think just recently Greg Louis has reported a success rate that 
high.  However neither he nor I expect the success rate to stay at that level.

I think bogofilter is catching about 90% of the spam coming into my mail 
server.  I don't have an exact figure because I don't see the need for 
one.  I _do_ know that very little spam gets past bogofilter to bother my 
users - a big difference from the pre-bogofilter days of 6 months ago.

So is it perfect?  No.  Will it ever be?  Maybe, but I doubt it.  Remember 
we're dealing with people who _really_ want their spam to be delivered and 
who will work hard towards that goal.  Of course, we're working hard 
towards the opposite goal.  Whether our goal is achieved or not, bogofilter 
_is_ making a difference in the lives of those who use it.

Cheers!

David





More information about the Bogofilter mailing list