why is this not spam?

David Relson relson at osagesoftware.com
Wed Aug 4 23:02:28 CEST 2004


On Wed, 4 Aug 2004 17:30:37 -0300
Trevor Smith wrote:

> So I've got bogofilter working, more or less, with KMail and a message
> with this header line ends up in my inbox:
> 
> X-Bogosity: No, tests=bogofilter, spamicity=0.987305, version=0.92.2
> 
> Why is a message with spamicity=0.987305 considered "No" in terms of
> bogosity?
> 
> I haven't set any spam or ham cutoff values (so they are at their
> defaults, I guess).
> 
> Doesn't the above qualify as a "very" likely spam? What am I missing?

Hello Trevor,

Bogofilter's default parameters were determined using the bogotune
program and a corpus of several hundred thousand messages from several
sources (personal email collections, 80-100 person businesses, etc).  To
minimize the likelihood of a false positive (and potential loss of an
important message), the spam_cutoff value was set higher (rather than
lower).  The effect of this is that ham messages are unlikely to be
scored as spam and that high scoring ("spammish") messages will score as
non-spam.  This "lets through" more spam (undesirable), but results in
fewer false positives (desirable).

I've also found it useful to keep _all_ my recent mail.  With that info,
I periodically score old messages using the current wordlist and look at
how high the scores of non-spam messages are.  With this info, I can
safely select a lower spam_cutoff value and still avoid false positives.

HTH,

David

P.S.  When you disagree with bogofilter's classification ("yes"/"no") of
a message, use it to further train bogofilter.  That will help
bogofilter do better in the future.



More information about the Bogofilter mailing list