FAQ update
David Relson
relson at osagesoftware.com
Wed Feb 19 05:51:48 CET 2003
At 10:22 PM 2/18/03, Eric Hanchrow wrote:
> >>>>> "David" == David Relson <relson at osagesoftware.com> writes:
>
> David> What other questions and answers have y'all asked that
> David> should be included? Please post them to the mailing list
> David> so we can all contribute.
>
>I don't know if anyone but me has asked this question frequently, but
>I'd love to know: why is bogofilter only catching 85% of my spam, and
>not the 99% I'd hoped it would catch?
Hi Eric,
Welcome to the mailing list. Did you have to ask such a tough question in
your first message? I can respond to, but I don't think I can fully answer it.
For bogofilter to be effective, it needs to be trained on what you consider
spam and what you consider ham (good). It saves that information in its
wordlists. When a new message arrives, bogofilter compares the words in it
to the words in the wordlists and computes a score that indicates whether
the message is spam or ham.
A newly arrived message may be like previous messages or it may
not. Bogofilter does what it can to measure the "likeness" and generate
the proper spam score from that info. Assuming the first case, then
bogofilter should correctly classify it. Assuming the second case,
bogofilter will be unable to correctly classify. One _could_ say if
bogofilter is properly trained it will get the classification right and one
_could_ say that improper training leads to improper
classification. Stated like this, the responsibility for classification is
on the trainer's shoulders. This line of thought is just a bit severe :-)
More realistically, it's probably not possible to fully train a spam filter
so that it catches all spam. Even if we could train bogofilter to catch
all spam in styles of the past, spammers are always developing new tricks
and techniques - with the explicit goal of getting past the
filters. Whatever we can catch today, there's always a new form tomorrow
that might be missed. The best way I know to handle it is to appreciate
the 85% that is caught and use the other 15% to train bogofilter so that
next week (or month) it can catch even more.
Will bogofilter ever reach 99%? For some people the answer is yes. I
think just recently Greg Louis has reported a success rate that
high. However neither he nor I expect the success rate to stay at that level.
I think bogofilter is catching about 90% of the spam coming into my mail
server. I don't have an exact figure because I don't see the need for
one. I _do_ know that very little spam gets past bogofilter to bother my
users - a big difference from the pre-bogofilter days of 6 months ago.
So is it perfect? No. Will it ever be? Maybe, but I doubt it. Remember
we're dealing with people who _really_ want their spam to be delivered and
who will work hard towards that goal. Of course, we're working hard
towards the opposite goal. Whether our goal is achieved or not, bogofilter
_is_ making a difference in the lives of those who use it.
Cheers!
David
More information about the Bogofilter
mailing list