procmail: Non-zero exitcode (1) from "/usr/bin/bogofilter"
dhottinger at harrisonburg.k12.va.us
dhottinger at harrisonburg.k12.va.us
Mon Sep 17 12:47:25 CEST 2007
Quoting David Relson <relson at osagesoftware.com>:
> On Fri, 14 Sep 2007 08:26:59 -0400
> dhottinger at harrisonburg.k12.va.us wrote:
>
>> Quoting David Relson <relson at osagesoftware.com>:
>>
>> > On Thu, 13 Sep 2007 07:57:20 -0400
>> > dhottinger at harrisonburg.k12.va.us wrote:
>> >
>> >> Quoting David Relson <relson at osagesoftware.com>:
>> >>
>> >> > On Thu, 13 Sep 2007 06:35:05 -0400
>> >> > dhottinger at harrisonburg.k12.va.us wrote:
>> >> >
>> >> > ..[snip]...
>> >> >> Thanks,
>> >> >> Isnt there a way to get the number of spam tokens and ham
>> >> >> tokens, some kind of ratio from my wordlist.db? Is it possible
>> >> >> that my wordlist just got out of whack?
>> >> >
>> >> > "bogoutil -p ..../wordlist.db .MSG_COUNT" will show the ratio of
>> >> > registered spam to ham messages.
>> >> >
>> >> > Actually counting spam vs ham tokens is tougher as each token has
>> >> > its spam and ham counts stored with it (as the "tail" experiment
>> >> > showed).
>> >> >
>> >> > "Pure spam" tokens would have "good" counts of 0, etc. Most
>> >> > tokens have both "good" and "bad" counts, as the following shows
>> >> >
>> >> > bogoutil -p ... Dwayne "from:Dwayne " "to:Dwayne "
>> >> >
>> >> > Cheers!
>> >> >
>> >> > David
>> >> >
>> >> I ran bogoutil -p ..../wordlist.db .MSG_COUNT
>> >> spam good Fisher
>> >> 111746 0 nan
>> >>
>> >> Not sure what Fisher is, but I upgraded to the latest version of
>> >> bogofilter this morning. Looks like I have no good counts in my
>> >> wordlist. Wonder what happens if I feed bogofilter with some good
>> >> email? Perhaps using bogofilter -nv < /path/to mailbox?
>> >
>> > "Fisher" refers to the "Robinson-Fisher" variation for generating a
>> > message's final score and is a relic of the days when bogofilter
>> > had a trio of scoring methods, i.e. Graham, Robinson, and
>> > Robinson-Fisher.
>> >
>> > The "nan" value means "not a number" indicating a division by zero
>> > issue. The zero "good" count indicates something significantly
>> > wrong.
>> >
>> > Bogofilter needs both good and spam email to work properly. With a
>> > "zero" good count, it can't work. Certainly feeding a bunch of ham
>> > to it would help. Ideally there's a reasonable balance of ham to
>> > spam. Though there's no precise proper ratio for "balance", under
>> > 1::10 will likely work. Have you 11,000 ham to train with? What
>> > might work a lot better is to check wordlist.db files in your
>> > backup tapes to find a wordlist with reasonable .MSG_COUNT values.
>> >
>> > HTH,
>> >
>> > David
>> >
>>
>> After I sent the email, I fed several users mailboxes (after
>> checking for spam) into bogofilter as ham. This seems to have helped
>> quite a bit and put things back into focus. Ive been trying to feed
>> both, and have a report as innocent option in webmail, which very few
>> users are using. This puts emails into a non-spam mailbox which I
>> then import into bogofilter using bogofilter -nv
>> < /var/local/not-spam. I usually dont get but 1-3 emails a month
>> reported as innocent though. Emails that sneak through get reported
>> as spam and imported using: bogofilter -Nsv < /var/local/imp-spam.
>> I'm thinking I should change this and use bogofilter -sv instead.
>> Maybe things will stay a little closer to center then. I really
>> appreciate all the information. It helps to get an expert opinion.
>>
>> ddh
>>
>> -- Dwayne Hottinger
>> Network Administrator
>> Harrisonburg City Public Schools
>
>
> "-N -s" is appropriate when the message was (1) incorrectly entered into
> the wordlist as ham, i.e. using "-n", (2) should have been entered as
> "spam", and (3) you're correcting the situation.
>
> Most often "-Ns" is used when the "-u" (auto-update option) is part of
> the default running of bogofilter. With auto-update, "-Ns" is used for
> fixing a false negative (spam incorrectly classified as ham).
>
> If a message has _not_ been registered as ham, then using "-N' is
> WRONG. Using "-Ns" frequently (and incorrectly) could result in the
> ham count being forced down to zero (which is a very BAD thing).
>
> HTH,
>
> David
>
Which is exactly what has happened in my case. So that being said, I
need to change my script, perhaps using just bogofilter -sv
</var/local/imp-spam (imp-spam is a mbox) instead of Ns. But the
emails that get reported as spam are incorrectly labeled as ham, and
in most cases there are quite a few of the same emails that are in the
imp-spam mailbox.
thanks,
ddh
--
Dwayne Hottinger
Network Administrator
Harrisonburg City Public Schools
More information about the Bogofilter
mailing list