garbage removal and 'outsiders noise'

Alejandro Dau adau at datamarkets.com.ar
Thu Apr 17 00:41:49 CEST 2003


On 16 Apr 2003 at 13:22, David Relson wrote:

> 
> Greetings Alejandro,
> 
> Welcome to the bogofilter mailing list.  We enjoy newcomers especially ones 
> with new ideas and the skills to implement them and contribute the code.
> 
Thanks for your welcome. 

> 
> Our algorithm expert, Greg Louis, has done a variety of tests.  In his 
> tests, deleting hapaxes (the term for one-occurrence tokens) gives poorer 
> results than keeping them.  Of course spam corpora vary so you results may 
> differ.
> 
> He's also done some testing with varying values of the parameters used in 
> the fisher algorithm, specifically the values of min_dev, robs, and 
> spam_cutoff, to see how they affect bogofilter's accuracy.  Take look at 
> his bogofilter website, www.bgl.nu/~glouis/bogofilter for his 
> findings.  His most recent test, "Bogofilter parameters(continued)", shows 
> that using  different parameters can have a major effect in making 
> bogofilter more accurate.

I've visited the site, great work indeed.

Did you or Greg make any test to try the use of stopwords, as the ones 
popfile uses? do they help in training bogofilter faster? By 'training 
faster', I mean using less messages to 'train by error' bogofilter.

Thanks
Alejandro




More information about the Bogofilter mailing list