Training from scratch.

Tom Eastman tom-lists at celleste.no-ip.org
Thu Jul 15 13:19:58 CEST 2004


I'm considering rebuilding my bogofilter database from scratch.  All email 
just gets passed through 'bogofilter -e -u -etc...' and all 
misclassifications get corrected quickly.  Hopefully it won't take long for 
the accuracy to get reasonably good.

I've never really played with the various constants you can set for the 
calculations but I was wondering... do you think it might be appropriate to 
set the min_dev to a very low value when your database is still very small?  
That way more tokens will be taken into account, and you can set it higher 
Once you have a better range of tokens in your database.

Does this sound like a plan?

My other question was with thresh_update.  I want to set this to a very low 
value so that emails that score very closely to 0.00 or 1.00 are not added to 
the database, but it occurred to me that when the database is very small it's 
quite possible that a spam could in fact get a score of 0.000.  If I then 
correct the classification with my script it might try to unregister from 
nonspam something that wasn't actually registered as nonspam to begin with!

Sorry, it's late and that last paragraph wasn't very clear.  To summarize:  
Will bad things happen if I register a spam with 'bogofilter -Ns' when the 
spam wasn't actually registered with 'bogofilter -n' in the first place?

Thanks!

	Tom



More information about the Bogofilter mailing list