Dealing with wordlist mails

Lars Clausen lc at statsbiblioteket.dk
Thu Jan 29 16:00:26 CET 2004


On Wed, 2004-01-28 at 13:27, David Relson wrote:
> On Wed, 28 Jan 2004 13:13:18 +0100
> Lars Clausen wrote:
[...]
> Good thoughts, but ...  
> 
> Would you rather lose an important message because your spam filter
> classified it wrong or would you rather have a few spam messages in your
> inbox?  

I would rather lose an important message than have a few dozen spam
messages in my inbox, but that's another discussion.

> The default score that bogofilter assigns to unknown and rarely
> seen words is 0.415, which causes it to favor delivery of spam (rather
> than loss of ham).  Bogofilter also has a min_dev value so that it will
> ignore words that score close to 0.5.  Min_dev's default value is 0.1,
> so bogofilter will ignore word scores between 0.4 and 0.6
> 
> In practice, random words in spam messages have little effect.  If you
> want more detail on how bogofilter classified a 0.500000 message, run it
> with flags "-vv" and "-vvv".  The FAQ has info on the output generated
> with those flag settings.

Tried that, but the words have already made it into the DB now, so I
don't see the same effect.

On the bright side, it looks like I'm over the peak WRT wordlist spams. 
I'm guessing they only have so many words in their wordlist, so now
bogofilter is catching up.

The next obvious development (of which I believe I've seen a few) is to
have randomly generated "words", to fill up the bogofilter DBs and take
advantage of the min_dev issues.  I'll have to keep an eye out for
that.  Now that I have sorting by bogosity implemented in Gnus, I can
easily find the wordlist mails and keep an eye on them.

-Lars





More information about the Bogofilter mailing list