procmail: Non-zero exitcode (1) from "/usr/bin/bogofilter"

David Relson relson at osagesoftware.com
Fri Sep 14 13:11:24 CEST 2007


On Thu, 13 Sep 2007 07:57:20 -0400
dhottinger at harrisonburg.k12.va.us wrote:

> Quoting David Relson <relson at osagesoftware.com>:
> 
> > On Thu, 13 Sep 2007 06:35:05 -0400
> > dhottinger at harrisonburg.k12.va.us wrote:
> >
> > ..[snip]...
> >> Thanks,
> >> Isnt there a way to get the number of spam tokens and ham tokens,
> >> some kind of ratio from my wordlist.db?  Is it possible that my
> >> wordlist just got out of whack?
> >
> > "bogoutil -p ..../wordlist.db .MSG_COUNT" will show the ratio of
> > registered spam to ham messages.
> >
> > Actually counting spam vs ham tokens is tougher as each token has
> > its spam and ham counts stored with it (as the "tail" experiment
> > showed).
> >
> > "Pure spam" tokens would have "good" counts of 0, etc.  Most tokens
> > have both "good" and "bad" counts, as the following shows
> >
> >   bogoutil -p ... Dwayne "from:Dwayne " "to:Dwayne "
> >
> > Cheers!
> >
> > David
> >
> I ran    bogoutil -p ..../wordlist.db .MSG_COUNT
> spam    good    Fisher
> 111746    0      nan
> 
> Not sure what Fisher is, but I upgraded to the latest version of  
> bogofilter this morning.  Looks like I have no good counts in my  
> wordlist.  Wonder what happens if I feed bogofilter with some good  
> email?  Perhaps using bogofilter -nv < /path/to mailbox?

"Fisher" refers to the "Robinson-Fisher" variation for generating a
message's final score and is a relic of the days when bogofilter had a
trio of scoring methods, i.e. Graham, Robinson, and Robinson-Fisher.

The "nan" value means "not a number" indicating a division by zero
issue.  The zero "good" count indicates something significantly wrong. 

Bogofilter needs both good and spam email to work properly.  With a
"zero" good count, it can't work.  Certainly feeding a bunch of ham to
it would help.  Ideally there's a reasonable balance of ham to spam.
Though there's no precise proper ratio for "balance", under 1::10 will
likely work.  Have you 11,000 ham to train with?  What might work a lot
better is to check wordlist.db files in your backup tapes to find a
wordlist with reasonable .MSG_COUNT values.

HTH,

David



More information about the Bogofilter mailing list