procmail: Non-zero exitcode (1) from "/usr/bin/bogofilter"
David Relson
relson at osagesoftware.com
Fri Sep 14 13:11:24 CEST 2007
On Thu, 13 Sep 2007 07:57:20 -0400
dhottinger at harrisonburg.k12.va.us wrote:
> Quoting David Relson <relson at osagesoftware.com>:
>
> > On Thu, 13 Sep 2007 06:35:05 -0400
> > dhottinger at harrisonburg.k12.va.us wrote:
> >
> > ..[snip]...
> >> Thanks,
> >> Isnt there a way to get the number of spam tokens and ham tokens,
> >> some kind of ratio from my wordlist.db? Is it possible that my
> >> wordlist just got out of whack?
> >
> > "bogoutil -p ..../wordlist.db .MSG_COUNT" will show the ratio of
> > registered spam to ham messages.
> >
> > Actually counting spam vs ham tokens is tougher as each token has
> > its spam and ham counts stored with it (as the "tail" experiment
> > showed).
> >
> > "Pure spam" tokens would have "good" counts of 0, etc. Most tokens
> > have both "good" and "bad" counts, as the following shows
> >
> > bogoutil -p ... Dwayne "from:Dwayne " "to:Dwayne "
> >
> > Cheers!
> >
> > David
> >
> I ran bogoutil -p ..../wordlist.db .MSG_COUNT
> spam good Fisher
> 111746 0 nan
>
> Not sure what Fisher is, but I upgraded to the latest version of
> bogofilter this morning. Looks like I have no good counts in my
> wordlist. Wonder what happens if I feed bogofilter with some good
> email? Perhaps using bogofilter -nv < /path/to mailbox?
"Fisher" refers to the "Robinson-Fisher" variation for generating a
message's final score and is a relic of the days when bogofilter had a
trio of scoring methods, i.e. Graham, Robinson, and Robinson-Fisher.
The "nan" value means "not a number" indicating a division by zero
issue. The zero "good" count indicates something significantly wrong.
Bogofilter needs both good and spam email to work properly. With a
"zero" good count, it can't work. Certainly feeding a bunch of ham to
it would help. Ideally there's a reasonable balance of ham to spam.
Though there's no precise proper ratio for "balance", under 1::10 will
likely work. Have you 11,000 ham to train with? What might work a lot
better is to check wordlist.db files in your backup tapes to find a
wordlist with reasonable .MSG_COUNT values.
HTH,
David
More information about the Bogofilter
mailing list