nan in bogofilter stats

David Relson relson at osagesoftware.com
Tue Apr 7 05:14:59 CEST 2009


On Tue, 7 Apr 2009 10:37:21 +0930
Stephen Davies wrote:

> Thanks David. I shall set up a similar cron job.
> 
> The zero count does not seem to come from a missing .MSG_COUNT but
> from an explicit entry such as:
> 
> .MSG_COUNT 276248 0 20090322
> 
> How can this come about?
> 
> Cheers,
> Stephen

Off-hand ???  I dunno.

.MSG_COUNT is bogofilter's special token for keeping track of how many
good and bad messages have been registered into the wordlist.  As a
quick rule of thumb, it should have a good count higher than _any_
other token's good count and a bad counter higher ...  

In normal usage as more and more messages are registered with
bogofilter the counts just keep increasing.  The exception is that
bogofilter's -N and -S flags cause a decrease.  

A scenario that would give a zero "good" count is: (1) register just a
few ham messages and (2) unregister that same number of ham messages.

If one registers large quantities of the incoming ham and spam,
then .MSG_COUNT won't ever have a ham or spam count that's even close
to zero.

Some interesting question are:  How do you use bogofilter?  Do
you ever register ham?  Do you just register spam?  Do you use the "-N"
flag?

For what it's worth, I've been using the "-u" (auto-update) flag for 6
or so years.  My mail server's .MSG_COUNT token has spam and ham counts
of 1,624,551 and 181,079.  The 10::1 ratio is higher than we
recommended (6 or 7 years ago) but gives good results (with approx
99.9% of the spam getting caught).  Of course, "-u" has a caveat:  one
must be conscientious about correcting errors (false positives and
false negatives).

Anyhow, I've rambled on long enough.

Hope that this is of value to you.

Regards,

David



More information about the Bogofilter mailing list