New script to train bogofilter

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Thu Jul 3 13:00:43 CEST 2003


Peter Bishop wrote:

>> I wrote a perl script which trains bogofilter on error. It
>> produces very small databases. We'll have to see how good
>> that works. Any comments are warmly welcome.
> 
> It will be interesting to see how well this works. 

I hope some people will test it. I have a version here which
has a bit more information in the output. Mail me if you
want it.

> In principle it might mean that the same spam is submitted several times

Right (not during one run, though).

> Maybe the script could "quarantine" spams that have already been
> used for training

Than you need to preserve that information between runs.
Would make the script much more complicated. Or you might
easily modify the script to save the messages to different
files depending on if they were used.

> In practice I don't think this will be a big problem, as the train on error
> approach should preferentially select spams with unknown tokens
> rather than existing ones.

Right, but, of course, having a message of different kind
might add those tokens to the other side.

> I think this approach should be evaluatated because I would expect a "train
> on error" database to be more effective than a database trained
> with a similar number of "typical" messages.

At least the database is much smaller.

pi





More information about the Bogofilter mailing list