Article on bogofilter
David Relson
relson at osagesoftware.com
Thu Feb 6 14:01:03 CET 2003
At 07:44 AM 2/6/03, Jake Di Toro wrote:
>On Wed, Feb 05, 2003 at 11:33:01PM -0500, David Relson wrote:
> > At 11:00 PM 2/5/03, Jake Di Toro wrote:
> >
> > The article doesn't seem _that_ bad. I'll be glad to write to him.
> >
> > I only spotted a couple of inaccuracies, specifically author info, version
> > info, and return codes. Also the sentence on using -n instead of -s is
> > unclear. What else did you see?
> >
>
>I think the bigest thing that got to me was his concept of wordlist
>population. He mentions that the number of mesages doesn't matter,
>and when he dumps a directory of 100+ msgs it only ditects as 3. And
>then his nightly cron job. He makes no provision to clear out the
>spam/nonspam directory, nor to reset the wordlist. So the way that
>cron job runs he's reregistering every piece of mail every night.
Jake,
Good points. I was looking at other details.
When he lists his spam directory, he shows 13 files. When he runs "cat
spam/* | bogofilter -s -v" he shows "93861 words, 3 messages". That's a
lot of words for so few messages. I bet that Sylpheed's maildir doesn't
include "^From " lines, so the cat command isn't generating the mailbox
format that bogofilter expects, so the message count is wrong.
I wondered about the cron job, but assumed he wasn't showing the whole
thing. I use an hourly cron job and it moves the messages after processing
them.
Since he has clearly spent some time experimenting with bogofilter, I
_assumed_ he was resetting the wordlist appropriately and wasn't double
registering messages. As you point out, it's impossible to tell from his
writeup.
I'll send him a note later today.
David
More information about the Bogofilter
mailing list