procmail: Non-zero exitcode (1) from "/usr/bin/bogofilter"
Tom Anderson
tanderso at oac-design.com
Fri Sep 14 15:34:04 CEST 2007
dhottinger at harrisonburg.k12.va.us wrote:
> Quoting David Relson <relson at osagesoftware.com>:
>
>> On Thu, 13 Sep 2007 07:57:20 -0400
>> dhottinger at harrisonburg.k12.va.us wrote:
>>
>>> Quoting David Relson <relson at osagesoftware.com>:
>>>
>>>> On Thu, 13 Sep 2007 06:35:05 -0400
>>>> dhottinger at harrisonburg.k12.va.us wrote:
>>>>
>>>> ..[snip]...
>>> I ran bogoutil -p ..../wordlist.db .MSG_COUNT
>>> spam good Fisher
>>> 111746 0 nan
>> Bogofilter needs both good and spam email to work properly. With a
>> "zero" good count, it can't work. Certainly feeding a bunch of ham to
>> it would help. Ideally there's a reasonable balance of ham to spam.
>> Though there's no precise proper ratio for "balance", under 1::10 will
>> likely work. Have you 11,000 ham to train with? What might work a lot
>> better is to check wordlist.db files in your backup tapes to find a
>> wordlist with reasonable .MSG_COUNT values.
>
> After I sent the email, I fed several users mailboxes (after checking
> for spam) into bogofilter as ham. This seems to have helped quite a
> bit and put things back into focus. Ive been trying to feed both, and
> have a report as innocent option in webmail, which very few users are
> using. This puts emails into a non-spam mailbox which I then import
> into bogofilter using bogofilter -nv < /var/local/not-spam. I usually
> dont get but 1-3 emails a month reported as innocent though. Emails
> that sneak through get reported as spam and imported using: bogofilter
> -Nsv < /var/local/imp-spam. I'm thinking I should change this and use
> bogofilter -sv instead. Maybe things will stay a little closer to
> center then. I really appreciate all the information. It helps to
> get an expert opinion.
spam good Fisher
.MSG_COUNT 820229 35342 0.500000
My ratio is 23::1 and it works perfectly. I don't think the ratio is
important at all. Just registering a single ham should get you going in
the right direction so that you don't get divide-by-zero errors. Then
just train on classification errors and your accuracy should stabilize.
No need to jump through hoops to keep a particular ratio. In fact,
the ideal ratio is probably precisely the ratio of ham to spam you
actually receive on a regular basis.
Tom
More information about the Bogofilter
mailing list