What is spam?

David Relson relson at osagesoftware.com
Wed May 12 13:52:24 CEST 2004


On 12 May 2004 07:30:15 -0400
Tom Anderson wrote:

> On Tue, 2004-05-11 at 19:27, David Relson wrote:
> > is valid.  I've got some doubts about the "over-sensitivity" clause.
> 
> It was my understanding that this was part of the impetus behind
> implementing this in the first place.  Registering the same tokens
> over and over again gives them more weight than they perhaps deserve,
> giving the wordlist increased "momentum" toward the same
> classifications. Desensitizing the wordlist allows quicker evolution
> in the face of new spam.  I'd imagine the space-saving effect is
> minimal since incrementing a counter does not require much, if any,
> additional space.
> 
> Tom

Tom,

thresh_update was implemented for database size reasons.  I don't worry
about momentum/inertia/sensitivity in the database and didn't even think
about that when I implemented it.

As of 0700 this morning, my wordlist has 1,335,216 tokens from 61,382
spam and 75,096 ham, I accept the fact that messages from certain
sources are likely to score Unsure and this is unlikely to change. 

Since bogofilter is correctly classifying over 99% of my incoming mail
as spam or ham (with less than 1% Unsures), I'm happy with its
performance.

Regards,

David



More information about the Bogofilter mailing list