auto-update in 0.16.2

Andreas Pardeike andreas at pardeike.net
Sat Jan 17 09:32:12 CET 2004


On 2004-01-17, at 02.23, David Relson wrote:

> <snip> Over 90%
> of them were "obvious" ham or spam, i.e. ham with scores < 0.01 or spam
> with scores > 0.99.  Since the messages were so easily categorized, it
> seems that there's little value in using them for training.  
> Introducing
> a config file option "thresh_update=0.01" and a corresponding command
> line option "-u 0.01" seemed the obvious way of dealing with this.
> Cutting the number of wordlist updates has the dual benefits of making
> bogofilter faster and slowing the growth of the database.

Me, being new to bogofilter but somehow familiar with statistical 
system,
I wonder if it's a good strategy to not train easy detected messages.

Isn't it so that a new spam message containing a few very well known 
words
(thus getting a high spam score) is in fact a good train for all other
details it contains. This would allow bogofilter to better catch 
variants
of spamming techniques.

Or am I totally off here?

Regards,
Andreas Pardeike

-- If no symptoms manifest, does a problem exist?





More information about the Bogofilter mailing list