procmail: Non-zero exitcode (1) from "/usr/bin/bogofilter"

Tom Anderson tanderso at oac-design.com
Fri Sep 14 15:34:04 CEST 2007


dhottinger at harrisonburg.k12.va.us wrote:
> Quoting David Relson <relson at osagesoftware.com>:
> 
>> On Thu, 13 Sep 2007 07:57:20 -0400
>> dhottinger at harrisonburg.k12.va.us wrote:
>>
>>> Quoting David Relson <relson at osagesoftware.com>:
>>>
>>>> On Thu, 13 Sep 2007 06:35:05 -0400
>>>> dhottinger at harrisonburg.k12.va.us wrote:
>>>>
>>>> ..[snip]...
>>> I ran    bogoutil -p ..../wordlist.db .MSG_COUNT
>>> spam    good    Fisher
>>> 111746    0      nan
>> Bogofilter needs both good and spam email to work properly.  With a
>> "zero" good count, it can't work.  Certainly feeding a bunch of ham to
>> it would help.  Ideally there's a reasonable balance of ham to spam.
>> Though there's no precise proper ratio for "balance", under 1::10 will
>> likely work.  Have you 11,000 ham to train with?  What might work a lot
>> better is to check wordlist.db files in your backup tapes to find a
>> wordlist with reasonable .MSG_COUNT values.
> 
> After I sent the email, I fed several users mailboxes (after checking  
> for spam) into bogofilter as ham.  This seems to have helped quite a  
> bit and put things back into focus.  Ive been trying to feed both, and  
> have a report as innocent option in webmail, which very few users are  
> using.  This puts emails into a non-spam mailbox which I then import  
> into bogofilter using bogofilter -nv < /var/local/not-spam.  I usually  
> dont get but 1-3 emails a month reported as innocent though.   Emails  
> that sneak through get reported as spam and imported using: bogofilter  
> -Nsv < /var/local/imp-spam.  I'm thinking I should change this and use  
> bogofilter -sv instead.  Maybe things will stay a little closer to  
> center then.  I really appreciate all the information.  It helps to  
> get an expert opinion.

                                  spam    good    Fisher
.MSG_COUNT                     820229   35342  0.500000

My ratio is 23::1 and it works perfectly.  I don't think the ratio is 
important at all.  Just registering a single ham should get you going in 
the right direction so that you don't get divide-by-zero errors.  Then 
just train on classification errors and your accuracy should stabilize. 
  No need to jump through hoops to keep a particular ratio.  In fact, 
the ideal ratio is probably precisely the ratio of ham to spam you 
actually receive on a regular basis.

Tom




More information about the Bogofilter mailing list