garbage removal and 'outsiders noise'

Jim Correia jim.correia at pobox.com
Wed Apr 16 23:09:27 CEST 2003


On Wednesday, April 16, 2003, at 04:51  PM, Greg Louis wrote:

>> At present there are 18659 good messages and 1569 spam messages in the
>> respective wordlists.
>
> The spam message count is a bit light.  I wouldn't recommend trying to
> optimize bogofilter's parameters (except spam cutoff) till you have at
> least 5000 spams.  Also, it might be wise to stop adding nonspams for a
> while; we don't have much experience with bogofilter's performance with
> extremely lopsided training databases, and theoretically it's better to
> keep them more even (one list 2-3 times the size of the other wouldn't
> worry me, but an order of magnitude is quite a difference).

When I originally trained bogofilter the message counts were the same 
order of magnitude.

I get a lot more legitimate mail than spam (roughly what you see in my 
wordlists - spam has been light recently roughly 12% - right before I 
put bogofilter into production use it was more like 25%) but enough 
spam in absolute numbers that I wanted to put something like bogofilter 
into place.

At present the only server side filtering that is done on my mail is 
the spam classification (in -u mode, and I retrain for 
mis-classifications). I suppose I could rework my procmail recipes such 
that no list mail is ever fed to bogofilter. (It would also be 
interesting to see what net effect that had on the accuracy of the 
classifications.

For people running in -u mode, do you run all of your mail through 
bogofilter, or do you sift out lists and other whitelist candidates yet?

(As an aside, while it would be nice that bogofilter caught 100% of my 
spam, in my limited use - about 1.5 weeks - it misses ~10% but I 
haven't had one false positive, which is much more important to me.)

Thanks,
Jim





More information about the Bogofilter mailing list