My wordlist doesn't detect spam very well anymore

Jonathan Kamens jik at kamens.us
Sun Feb 9 13:47:37 CET 2020


In my experience, you need to save a big corpus of known, recent spam 
and ham messages and periodically run bogotune to determine the 
currently optimal parameters for your .bogofilter.cf file. Personally, I 
save copies of all spam and ham messages going back for months and run 
bogotune once per month. My .MSG_COUNT is currently 66,915 ham and 
114,479 spam.

The way I have things set up, unsure messages get put into my ham corpus 
automatically; when I reclassify ham or unsure messages as spam, they 
get moved into my spam corpus automatically, and when I reclassify spam 
as ham, it gets moved into the ham corpus automatically.

I have a wrapper around the bogotune script which goes through the large 
corpus of messages that are about to be used for bogotune and, using the 
current settings in .bogofilter.cf, warns me if it believes any of them 
are misclassified. I review those and make sure they're all in the right 
place before I feed them into bogotune, to ensure that I'm not feeding 
any incorrect queues to bogotune.

All of this together seems to do a good enough job of keeping bogofilter 
accurate, though for some reason I've never been able to figure out it 
has an ongoing habit of classifying messages from my wife as spam. ;-)

   jik



More information about the bogofilter mailing list