procmail: Non-zero exitcode (1) from "/usr/bin/bogofilter"

David Relson relson at osagesoftware.com
Mon Sep 17 04:55:16 CEST 2007


On Fri, 14 Sep 2007 08:26:59 -0400
dhottinger at harrisonburg.k12.va.us wrote:

> Quoting David Relson <relson at osagesoftware.com>:
> 
> > On Thu, 13 Sep 2007 07:57:20 -0400
> > dhottinger at harrisonburg.k12.va.us wrote:
> >
> >> Quoting David Relson <relson at osagesoftware.com>:
> >>
> >> > On Thu, 13 Sep 2007 06:35:05 -0400
> >> > dhottinger at harrisonburg.k12.va.us wrote:
> >> >
> >> > ..[snip]...
> >> >> Thanks,
> >> >> Isnt there a way to get the number of spam tokens and ham
> >> >> tokens, some kind of ratio from my wordlist.db?  Is it possible
> >> >> that my wordlist just got out of whack?
> >> >
> >> > "bogoutil -p ..../wordlist.db .MSG_COUNT" will show the ratio of
> >> > registered spam to ham messages.
> >> >
> >> > Actually counting spam vs ham tokens is tougher as each token has
> >> > its spam and ham counts stored with it (as the "tail" experiment
> >> > showed).
> >> >
> >> > "Pure spam" tokens would have "good" counts of 0, etc.  Most
> >> > tokens have both "good" and "bad" counts, as the following shows
> >> >
> >> >   bogoutil -p ... Dwayne "from:Dwayne " "to:Dwayne "
> >> >
> >> > Cheers!
> >> >
> >> > David
> >> >
> >> I ran    bogoutil -p ..../wordlist.db .MSG_COUNT
> >> spam    good    Fisher
> >> 111746    0      nan
> >>
> >> Not sure what Fisher is, but I upgraded to the latest version of
> >> bogofilter this morning.  Looks like I have no good counts in my
> >> wordlist.  Wonder what happens if I feed bogofilter with some good
> >> email?  Perhaps using bogofilter -nv < /path/to mailbox?
> >
> > "Fisher" refers to the "Robinson-Fisher" variation for generating a
> > message's final score and is a relic of the days when bogofilter
> > had a trio of scoring methods, i.e. Graham, Robinson, and
> > Robinson-Fisher.
> >
> > The "nan" value means "not a number" indicating a division by zero
> > issue.  The zero "good" count indicates something significantly
> > wrong.
> >
> > Bogofilter needs both good and spam email to work properly.  With a
> > "zero" good count, it can't work.  Certainly feeding a bunch of ham
> > to it would help.  Ideally there's a reasonable balance of ham to
> > spam. Though there's no precise proper ratio for "balance", under
> > 1::10 will likely work.  Have you 11,000 ham to train with?  What
> > might work a lot better is to check wordlist.db files in your
> > backup tapes to find a wordlist with reasonable .MSG_COUNT values.
> >
> > HTH,
> >
> > David
> >
> 
> After I sent the email, I fed several users mailboxes (after
> checking for spam) into bogofilter as ham.  This seems to have helped
> quite a bit and put things back into focus.  Ive been trying to feed
> both, and have a report as innocent option in webmail, which very few
> users are using.  This puts emails into a non-spam mailbox which I
> then import into bogofilter using bogofilter -nv
> < /var/local/not-spam.  I usually dont get but 1-3 emails a month
> reported as innocent though.   Emails that sneak through get reported
> as spam and imported using: bogofilter -Nsv < /var/local/imp-spam.
> I'm thinking I should change this and use bogofilter -sv instead.
> Maybe things will stay a little closer to center then.  I really
> appreciate all the information.  It helps to get an expert opinion.
> 
> ddh
> 
> -- Dwayne Hottinger
> Network Administrator
> Harrisonburg City Public Schools


"-N -s" is appropriate when the message was (1) incorrectly entered into
the wordlist as ham, i.e. using "-n", (2) should have been entered as
"spam", and (3) you're correcting the situation.

Most often "-Ns" is used when the "-u" (auto-update option) is part of
the default running of bogofilter.  With auto-update, "-Ns" is used for
fixing a false negative (spam incorrectly classified as ham).  

If a message has _not_ been registered as ham, then using "-N' is
WRONG.  Using "-Ns" frequently (and incorrectly) could result in the
ham count being forced down to zero (which is a very BAD thing).

HTH,

David



More information about the Bogofilter mailing list