New script to train bogofilter

Wed Jul 2 09:57:45 CEST 2003

On Wed, 2 Jul 2003, Boris 'pi' Piwinger wrote:

> Boris 'pi' Piwinger wrote:
>
> > I wrote a perl script which trains bogofilter on error. It
> > produces very small databases. We'll have to see how good
> > that works. Any comments are warmly welcome.
>
> I reran my script until I got no errors. It was still
> extremely small: 352 spam and 291 ham
>
> So my first estimation: This works perfectly, we need far
> less messages in the database than we thought before. There
> seems to be no practical reason to avoid multiple
> classification of the same message.

If I understand correctly, you are correcting for mistakes over and over
again until bogofilter finally gets it right.

I confess that I do not understand all the bogomath, but I have always
wondered if high message counts in the database waters down new input.

Maybe what is needed is a 'super' spam/ham switch:

bogofilter --force -Ns < some-spammy-message

--force would keep repeating the action until the message was correctly
identified (in this case repeatedly adding the message to the spam
wordlist and removing it from the ham wordlist). Of course, in practice
people make lots of mistakes classifying spam (at least in a server wide
install). Something like this would really magnify any mistake, but maybe
it could also be easily corrected. Seems like --force should be
incompatible with -u.

-elijah