parameter experiment repeated with more data

Peter Bishop pgb at adelard.com
Thu Apr 17 22:16:12 CEST 2003


On 17 Apr 2003 at 13:28, Greg Louis wrote:

> The latest of my attempts to characterize the effects of varying
> Robinson's s and the minimum deviation parameter in bogofilter is a
> repeat of the previous one, with many more data.  The writeup at
> http://www.bgl.nu/bogofilter/smindev3.html has been updated
> accordingly.  It begins to appear as though it would be generally good
> for bogofilter to ship with s set to 0.1 and the minimum deviation as
> high as 0.44 -- though these settings may require a well-trained
> database (several thousand each of spam and nonspam messages) to be
> effective.

Could you clarify whether the training corpara were different from the
test corpora, i.e. did you split the spam and ham into half and use one for 
training and the other for testing?
Using different sets for training and testing might be more realistic
as new spam won't be identical to the old spam.



-- 
Peter Bishop 
pgb at adelard.com
pgb at csr.city.ac.uk






More information about the Bogofilter mailing list