Article on bogofilter

David Relson relson at osagesoftware.com
Thu Feb 6 14:01:03 CET 2003


At 07:44 AM 2/6/03, Jake Di Toro wrote:

>On Wed, Feb 05, 2003 at 11:33:01PM -0500, David Relson wrote:
> > At 11:00 PM 2/5/03, Jake Di Toro wrote:
> >
> > The article doesn't seem _that_ bad.  I'll be glad to write to him.
> >
> > I only spotted a couple of inaccuracies, specifically author info, version
> > info, and return codes.  Also the sentence on using -n instead of -s is
> > unclear.  What else did you see?
> >
>
>I think the bigest thing that got to me was his concept of wordlist
>population.  He mentions that the number of mesages doesn't matter,
>and when he dumps a directory of 100+ msgs it only ditects as 3.  And
>then his nightly cron job.  He makes no provision to clear out the
>spam/nonspam directory, nor to reset the wordlist.  So the way that
>cron job runs he's reregistering every piece of mail every night.

Jake,

Good points.  I was looking at other details.

When he lists his spam directory, he shows 13 files.  When he runs "cat 
spam/* | bogofilter -s -v" he shows "93861 words, 3 messages".  That's a 
lot of words for so few messages.  I bet that Sylpheed's maildir doesn't 
include "^From " lines, so the cat command isn't generating the mailbox 
format that bogofilter expects, so the message count is wrong.

I wondered about the cron job, but assumed he wasn't showing the whole 
thing.  I use an hourly cron job and it moves the messages after processing 
them.

Since he has clearly spent some time experimenting with bogofilter, I 
_assumed_ he was resetting the wordlist appropriately and wasn't double 
registering messages.  As you point out, it's impossible to tell from his 
writeup.

I'll send him a note later today.

David






More information about the Bogofilter mailing list