[bogofilter] misclassified emails in global wordlist

Chris Fortune cfortune at telus.net
Mon Apr 19 11:11:37 CEST 2004


I have a feedback website where users can log in and report misclassified emails.  It works very well to maintain the global
wordlist, except for one major problem: the users!   They will actually report some spam as non-spam and vica versa, either by
mistake, laziness, or other inscrutible motivations.  I've done my best to educate them, but what can you do?  Sometimes they hit
the wrong button.

As a precaution, I automatically put every reported email through a gauntlet of open-source content filters before registering it,
but the anomolies are not always successfully weeded out (and sometimes the content filters introduce their own errors).   The
wordlist is "polluted" by these incorrect registrations, but I wonder how bad it is?  How tolerant is the classifier?  When do I
have to be concerned?  I want to run it completely automatically.  Is administrative intervention always necessary to maintain the
wordlist correctly?  Has anybody run into the same problems?

Maybe the question is this: is a global wordlist with user input always cursed with fuzzy results?  To use it wisely, should I tweak
the configuration, and how?  What other precautions should I take?





More information about the Bogofilter mailing list