Bogofilter Best Practices?

David Relson relson at osagesoftware.com
Wed Dec 9 03:49:28 CET 2009


On Tue, 8 Dec 2009 18:09:32 +0000
RW wrote:

> On Mon, 07 Dec 2009 17:00:05 -0800
> "Randy J. Ray" <rjray at blackperl.com> wrote:
> 
> 
> >  We get
> > good-enough performance and throughput on the actual classification
> > of incoming messages. It's the creation of the word-list files from
> > our (growing) corpus that is driving me nuts.
> 
> >From the sound of it you are creating a wordlist from scratch from
> historical corpora, why not just learn today's mail into yesterday's
> wordlist.

As Matthias suggested, create a MH, maildir, or mbox for the ham and
for the spam.

As RW suggested, do an incremental update.

Combining the two techniques will cut the bogofilter runs from multiple
small runs (slow) to 2 large runs (much, much faster).

HTH,

David



More information about the Bogofilter mailing list