My wordlist doesn't detect spam very well anymore
Jonathan Kamens
jik at kamens.us
Sun Feb 9 13:47:37 CET 2020
In my experience, you need to save a big corpus of known, recent spam
and ham messages and periodically run bogotune to determine the
currently optimal parameters for your .bogofilter.cf file. Personally, I
save copies of all spam and ham messages going back for months and run
bogotune once per month. My .MSG_COUNT is currently 66,915 ham and
114,479 spam.
The way I have things set up, unsure messages get put into my ham corpus
automatically; when I reclassify ham or unsure messages as spam, they
get moved into my spam corpus automatically, and when I reclassify spam
as ham, it gets moved into the ham corpus automatically.
I have a wrapper around the bogotune script which goes through the large
corpus of messages that are about to be used for bogotune and, using the
current settings in .bogofilter.cf, warns me if it believes any of them
are misclassified. I review those and make sure they're all in the right
place before I feed them into bogotune, to ensure that I'm not feeding
any incorrect queues to bogotune.
All of this together seems to do a good enough job of keeping bogofilter
accurate, though for some reason I've never been able to figure out it
has an ongoing habit of classifying messages from my wife as spam. ;-)
jik
More information about the bogofilter
mailing list