Getting "nan" in my verbose output
cwilkes-bf at ladro.com
Thu Apr 8 15:51:26 EDT 2004
On Thu, Apr 08, 2004 at 03:30:55PM -0400, David Relson wrote:
> On Thu, 8 Apr 2004 11:17:36 -0700
> Chris Wilkes wrote:
> > Hi all,
> > I just upgraded to version 0.17.5 and am looking at one of my user's
> > "makespam" folders where they dump email to be classified as spam.
> > What's odd is that the spam values of the emails were all around 0.5
> > until I ran them through a "-Ns" and then suddenly the spam count was
> > 0! Digging deeper I found the problem : my -Ns reduced the total good
> > count to 0.
> > Looking at one particular message I found the word "Pharmacy" in it.
> > $ bogofilter -vvv -I bademail.txt | grep -i Pharmacy
> > Word n pgood pbad fw U
> > "Pharmacy" 86 nan 0.002943 nan -
> > $ bogoutil -w ./wordlist.db Pharmacy
> > spam good
> > Pharmacy 86 0
> > $ bogoutil -w ./wordlist.db .MSG_COUNT
> > spam good
> > .MSG_COUNT 29226 0
> > What gives? This word has only been seen in spams (86 to 0) yet it
> > doesn't contribute to the spam count ("-" for U).
> Did you do a whole bunch of "-Ns" to cause the zero??? Do you think the
> good count went to zero appropriately, or inappropriately?
> As you've discovered the "nan" is a result of a zero message count.
> Bogofilter uses the message count as a divisor and the division by zero
> causes the problem. As a minor speedup, the "if 0, use 1 for division"
> check was deleted a while ago. I'll modify the code and fix the code to
> fix this for the next release.
Yes, you are correct, I did a bunch of -s's in a row on all the email
that she dumped into "makespam" After doing so I take out the ones
that didn't register as spam and then do a -Ns on them (probably more
than once too). Since she gets a lot more spam than ham doing so could
be problematic as lot of people here dump correctly classified spam into
Now I'll just stick to doing a -s. Since I keep track of everyone's
"makegoods" I'm running those through everyone's wordlist with a -n to
ward off this problem.
Thanks for putting the check back in there. Hey, it could of been worse
and my -N caused the good .MSG_COUNT to go negative :)
More information about the Bogofilter