procmail: Non-zero exitcode (1) from "/usr/bin/bogofilter"

dhottinger at harrisonburg.k12.va.us dhottinger at harrisonburg.k12.va.us
Fri Sep 14 14:26:59 CEST 2007


Quoting David Relson <relson at osagesoftware.com>:

> On Thu, 13 Sep 2007 07:57:20 -0400
> dhottinger at harrisonburg.k12.va.us wrote:
>
>> Quoting David Relson <relson at osagesoftware.com>:
>>
>> > On Thu, 13 Sep 2007 06:35:05 -0400
>> > dhottinger at harrisonburg.k12.va.us wrote:
>> >
>> > ..[snip]...
>> >> Thanks,
>> >> Isnt there a way to get the number of spam tokens and ham tokens,
>> >> some kind of ratio from my wordlist.db?  Is it possible that my
>> >> wordlist just got out of whack?
>> >
>> > "bogoutil -p ..../wordlist.db .MSG_COUNT" will show the ratio of
>> > registered spam to ham messages.
>> >
>> > Actually counting spam vs ham tokens is tougher as each token has
>> > its spam and ham counts stored with it (as the "tail" experiment
>> > showed).
>> >
>> > "Pure spam" tokens would have "good" counts of 0, etc.  Most tokens
>> > have both "good" and "bad" counts, as the following shows
>> >
>> >   bogoutil -p ... Dwayne "from:Dwayne " "to:Dwayne "
>> >
>> > Cheers!
>> >
>> > David
>> >
>> I ran    bogoutil -p ..../wordlist.db .MSG_COUNT
>> spam    good    Fisher
>> 111746    0      nan
>>
>> Not sure what Fisher is, but I upgraded to the latest version of
>> bogofilter this morning.  Looks like I have no good counts in my
>> wordlist.  Wonder what happens if I feed bogofilter with some good
>> email?  Perhaps using bogofilter -nv < /path/to mailbox?
>
> "Fisher" refers to the "Robinson-Fisher" variation for generating a
> message's final score and is a relic of the days when bogofilter had a
> trio of scoring methods, i.e. Graham, Robinson, and Robinson-Fisher.
>
> The "nan" value means "not a number" indicating a division by zero
> issue.  The zero "good" count indicates something significantly wrong.
>
> Bogofilter needs both good and spam email to work properly.  With a
> "zero" good count, it can't work.  Certainly feeding a bunch of ham to
> it would help.  Ideally there's a reasonable balance of ham to spam.
> Though there's no precise proper ratio for "balance", under 1::10 will
> likely work.  Have you 11,000 ham to train with?  What might work a lot
> better is to check wordlist.db files in your backup tapes to find a
> wordlist with reasonable .MSG_COUNT values.
>
> HTH,
>
> David
>

After I sent the email, I fed several users mailboxes (after checking  
for spam) into bogofilter as ham.  This seems to have helped quite a  
bit and put things back into focus.  Ive been trying to feed both, and  
have a report as innocent option in webmail, which very few users are  
using.  This puts emails into a non-spam mailbox which I then import  
into bogofilter using bogofilter -nv < /var/local/not-spam.  I usually  
dont get but 1-3 emails a month reported as innocent though.   Emails  
that sneak through get reported as spam and imported using: bogofilter  
-Nsv < /var/local/imp-spam.  I'm thinking I should change this and use  
bogofilter -sv instead.  Maybe things will stay a little closer to  
center then.  I really appreciate all the information.  It helps to  
get an expert opinion.

ddh

-- Dwayne Hottinger
Network Administrator
Harrisonburg City Public Schools




More information about the Bogofilter mailing list