procmail: Non-zero exitcode (1) from "/usr/bin/bogofilter"

dhottinger at harrisonburg.k12.va.us dhottinger at harrisonburg.k12.va.us
Mon Sep 17 12:47:25 CEST 2007


Quoting David Relson <relson at osagesoftware.com>:

> On Fri, 14 Sep 2007 08:26:59 -0400
> dhottinger at harrisonburg.k12.va.us wrote:
>
>> Quoting David Relson <relson at osagesoftware.com>:
>>
>> > On Thu, 13 Sep 2007 07:57:20 -0400
>> > dhottinger at harrisonburg.k12.va.us wrote:
>> >
>> >> Quoting David Relson <relson at osagesoftware.com>:
>> >>
>> >> > On Thu, 13 Sep 2007 06:35:05 -0400
>> >> > dhottinger at harrisonburg.k12.va.us wrote:
>> >> >
>> >> > ..[snip]...
>> >> >> Thanks,
>> >> >> Isnt there a way to get the number of spam tokens and ham
>> >> >> tokens, some kind of ratio from my wordlist.db?  Is it possible
>> >> >> that my wordlist just got out of whack?
>> >> >
>> >> > "bogoutil -p ..../wordlist.db .MSG_COUNT" will show the ratio of
>> >> > registered spam to ham messages.
>> >> >
>> >> > Actually counting spam vs ham tokens is tougher as each token has
>> >> > its spam and ham counts stored with it (as the "tail" experiment
>> >> > showed).
>> >> >
>> >> > "Pure spam" tokens would have "good" counts of 0, etc.  Most
>> >> > tokens have both "good" and "bad" counts, as the following shows
>> >> >
>> >> >   bogoutil -p ... Dwayne "from:Dwayne " "to:Dwayne "
>> >> >
>> >> > Cheers!
>> >> >
>> >> > David
>> >> >
>> >> I ran    bogoutil -p ..../wordlist.db .MSG_COUNT
>> >> spam    good    Fisher
>> >> 111746    0      nan
>> >>
>> >> Not sure what Fisher is, but I upgraded to the latest version of
>> >> bogofilter this morning.  Looks like I have no good counts in my
>> >> wordlist.  Wonder what happens if I feed bogofilter with some good
>> >> email?  Perhaps using bogofilter -nv < /path/to mailbox?
>> >
>> > "Fisher" refers to the "Robinson-Fisher" variation for generating a
>> > message's final score and is a relic of the days when bogofilter
>> > had a trio of scoring methods, i.e. Graham, Robinson, and
>> > Robinson-Fisher.
>> >
>> > The "nan" value means "not a number" indicating a division by zero
>> > issue.  The zero "good" count indicates something significantly
>> > wrong.
>> >
>> > Bogofilter needs both good and spam email to work properly.  With a
>> > "zero" good count, it can't work.  Certainly feeding a bunch of ham
>> > to it would help.  Ideally there's a reasonable balance of ham to
>> > spam. Though there's no precise proper ratio for "balance", under
>> > 1::10 will likely work.  Have you 11,000 ham to train with?  What
>> > might work a lot better is to check wordlist.db files in your
>> > backup tapes to find a wordlist with reasonable .MSG_COUNT values.
>> >
>> > HTH,
>> >
>> > David
>> >
>>
>> After I sent the email, I fed several users mailboxes (after
>> checking for spam) into bogofilter as ham.  This seems to have helped
>> quite a bit and put things back into focus.  Ive been trying to feed
>> both, and have a report as innocent option in webmail, which very few
>> users are using.  This puts emails into a non-spam mailbox which I
>> then import into bogofilter using bogofilter -nv
>> < /var/local/not-spam.  I usually dont get but 1-3 emails a month
>> reported as innocent though.   Emails that sneak through get reported
>> as spam and imported using: bogofilter -Nsv < /var/local/imp-spam.
>> I'm thinking I should change this and use bogofilter -sv instead.
>> Maybe things will stay a little closer to center then.  I really
>> appreciate all the information.  It helps to get an expert opinion.
>>
>> ddh
>>
>> -- Dwayne Hottinger
>> Network Administrator
>> Harrisonburg City Public Schools
>
>
> "-N -s" is appropriate when the message was (1) incorrectly entered into
> the wordlist as ham, i.e. using "-n", (2) should have been entered as
> "spam", and (3) you're correcting the situation.
>
> Most often "-Ns" is used when the "-u" (auto-update option) is part of
> the default running of bogofilter.  With auto-update, "-Ns" is used for
> fixing a false negative (spam incorrectly classified as ham).
>
> If a message has _not_ been registered as ham, then using "-N' is
> WRONG.  Using "-Ns" frequently (and incorrectly) could result in the
> ham count being forced down to zero (which is a very BAD thing).
>
> HTH,
>
> David
>
Which is exactly what has happened in my case.  So that being said, I  
need to change my script, perhaps using just bogofilter -sv  
</var/local/imp-spam (imp-spam is a mbox) instead of Ns.  But the  
emails that get reported as spam are incorrectly labeled as ham, and  
in most cases there are quite a few of the same emails that are in the  
imp-spam mailbox.

thanks,
ddh


-- 
Dwayne Hottinger
Network Administrator
Harrisonburg City Public Schools




More information about the Bogofilter mailing list