procmail: Non-zero exitcode (1) from "/usr/bin/bogofilter"
dhottinger at harrisonburg.k12.va.us
dhottinger at harrisonburg.k12.va.us
Fri Sep 14 14:26:59 CEST 2007
Quoting David Relson <relson at osagesoftware.com>:
> On Thu, 13 Sep 2007 07:57:20 -0400
> dhottinger at harrisonburg.k12.va.us wrote:
>
>> Quoting David Relson <relson at osagesoftware.com>:
>>
>> > On Thu, 13 Sep 2007 06:35:05 -0400
>> > dhottinger at harrisonburg.k12.va.us wrote:
>> >
>> > ..[snip]...
>> >> Thanks,
>> >> Isnt there a way to get the number of spam tokens and ham tokens,
>> >> some kind of ratio from my wordlist.db? Is it possible that my
>> >> wordlist just got out of whack?
>> >
>> > "bogoutil -p ..../wordlist.db .MSG_COUNT" will show the ratio of
>> > registered spam to ham messages.
>> >
>> > Actually counting spam vs ham tokens is tougher as each token has
>> > its spam and ham counts stored with it (as the "tail" experiment
>> > showed).
>> >
>> > "Pure spam" tokens would have "good" counts of 0, etc. Most tokens
>> > have both "good" and "bad" counts, as the following shows
>> >
>> > bogoutil -p ... Dwayne "from:Dwayne " "to:Dwayne "
>> >
>> > Cheers!
>> >
>> > David
>> >
>> I ran bogoutil -p ..../wordlist.db .MSG_COUNT
>> spam good Fisher
>> 111746 0 nan
>>
>> Not sure what Fisher is, but I upgraded to the latest version of
>> bogofilter this morning. Looks like I have no good counts in my
>> wordlist. Wonder what happens if I feed bogofilter with some good
>> email? Perhaps using bogofilter -nv < /path/to mailbox?
>
> "Fisher" refers to the "Robinson-Fisher" variation for generating a
> message's final score and is a relic of the days when bogofilter had a
> trio of scoring methods, i.e. Graham, Robinson, and Robinson-Fisher.
>
> The "nan" value means "not a number" indicating a division by zero
> issue. The zero "good" count indicates something significantly wrong.
>
> Bogofilter needs both good and spam email to work properly. With a
> "zero" good count, it can't work. Certainly feeding a bunch of ham to
> it would help. Ideally there's a reasonable balance of ham to spam.
> Though there's no precise proper ratio for "balance", under 1::10 will
> likely work. Have you 11,000 ham to train with? What might work a lot
> better is to check wordlist.db files in your backup tapes to find a
> wordlist with reasonable .MSG_COUNT values.
>
> HTH,
>
> David
>
After I sent the email, I fed several users mailboxes (after checking
for spam) into bogofilter as ham. This seems to have helped quite a
bit and put things back into focus. Ive been trying to feed both, and
have a report as innocent option in webmail, which very few users are
using. This puts emails into a non-spam mailbox which I then import
into bogofilter using bogofilter -nv < /var/local/not-spam. I usually
dont get but 1-3 emails a month reported as innocent though. Emails
that sneak through get reported as spam and imported using: bogofilter
-Nsv < /var/local/imp-spam. I'm thinking I should change this and use
bogofilter -sv instead. Maybe things will stay a little closer to
center then. I really appreciate all the information. It helps to
get an expert opinion.
ddh
-- Dwayne Hottinger
Network Administrator
Harrisonburg City Public Schools
More information about the Bogofilter
mailing list