garbage removal and 'outsiders noise'

Greg Louis glouis at dynamicro.on.ca
Wed Apr 16 22:51:48 CEST 2003


On 20030416 (Wed) at 1337:53 -0400, Jim Correia wrote:
> On Wednesday, April 16, 2003, at 01:22  PM, David Relson wrote:
> 
> >His most recent test, "Bogofilter parameters(continued)", shows that 
> >using  different parameters can have a major effect in making 
> >bogofilter more accurate.
> >
> >Conclusion, using a site's email to determine the best parameters for 
> >bogofilter can have a _big_ effect.
> >

> Is it naive of me to be running bogofilter with the defaults?

The defaults were chosen some time ago; they don't work badly, but
chances are one can do a lot better.

> I'm running with spam/ham classification, cutoff of 0.95.
> 
> (I notice that some of the false negatives are close to the cutoff, but 
> most are numerically far from it, so perhaps I am answering my own 
> question :-)
> 
> This catches about 90% of my spam (I retrain -Ns with the false 
> negatives) and haven't had a false positive yet.

Sounds as though you could lower the spam cutoff a bit without changing
anything else; that would increase the success rate catching spam (but
also the chance of getting some fp's -- if that happens, the spam
cutoff should go back up at least partway.

> At present there are 18659 good messages and 1569 spam messages in the 
> respective wordlists.

The spam message count is a bit light.  I wouldn't recommend trying to
optimize bogofilter's parameters (except spam cutoff) till you have at
least 5000 spams.  Also, it might be wise to stop adding nonspams for a
while; we don't have much experience with bogofilter's performance with
extremely lopsided training databases, and theoretically it's better to
keep them more even (one list 2-3 times the size of the other wouldn't
worry me, but an order of magnitude is quite a difference).

Hope that helps............
-- 
| G r e g  L o u i s          | gpg public key: finger     |
|   http://www.bgl.nu/~glouis |   glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |




More information about the Bogofilter mailing list