Bogofilter Best Practices?
David Relson
relson at osagesoftware.com
Wed Dec 9 03:49:28 CET 2009
On Tue, 8 Dec 2009 18:09:32 +0000
RW wrote:
> On Mon, 07 Dec 2009 17:00:05 -0800
> "Randy J. Ray" <rjray at blackperl.com> wrote:
>
>
> > We get
> > good-enough performance and throughput on the actual classification
> > of incoming messages. It's the creation of the word-list files from
> > our (growing) corpus that is driving me nuts.
>
> >From the sound of it you are creating a wordlist from scratch from
> historical corpora, why not just learn today's mail into yesterday's
> wordlist.
As Matthias suggested, create a MH, maildir, or mbox for the ham and
for the spam.
As RW suggested, do an incremental update.
Combining the two techniques will cut the bogofilter runs from multiple
small runs (slow) to 2 large runs (much, much faster).
HTH,
David
More information about the Bogofilter
mailing list