Dealing with wordlist mails

Geoff capsthorne at yahoo.co.uk
Wed Jan 28 14:02:52 CET 2004


On Wed, 28 Jan 2004 07:27:36 -0500
David Relson <relson at osagesoftware.com> wrote:

...
> 
> Would you rather lose an important message because your
> spam filter classified it wrong or would you rather have a
> few spam messages in your inbox?  The default score that
> bogofilter assigns to unknown and rarely seen words is
> 0.415, which causes it to favor delivery of spam (rather
> than loss of ham).  Bogofilter also has a min_dev value so
> that it will ignore words that score close to 0.5. 
> Min_dev's default value is 0.1, so bogofilter will ignore
> word scores between 0.4 and 0.6
> 
> In practice, random words in spam messages have little
> effect.  If you want more detail on how bogofilter
> classified a 0.500000 message, run it with flags "-vv" and
> "-vvv".  The FAQ has info on the output generated with
> those flag settings.
> 

David,

Your work-rate on bogofilter is so high that I hestitate to
suggest that you should do the job, but surely
the random-words / min_dev / robx issue would benefit from
an entry in the FAQ?  Bogofilter is a success and is
attracting many users, who, like me, have only the
vaguest grasp of its underlying theory.  The spammer's
random words technique drives lots of us wild and I think
that there are many who would be willing to accept a slight
trade-off in accuracy in favour of more effective first-hit
filtration.

Geoff




More information about the Bogofilter mailing list