Bogofilter Best Practices?
    David Relson 
    relson at osagesoftware.com
       
    Wed Dec  9 03:49:28 CET 2009
    
    
  
On Tue, 8 Dec 2009 18:09:32 +0000
RW wrote:
> On Mon, 07 Dec 2009 17:00:05 -0800
> "Randy J. Ray" <rjray at blackperl.com> wrote:
> 
> 
> >  We get
> > good-enough performance and throughput on the actual classification
> > of incoming messages. It's the creation of the word-list files from
> > our (growing) corpus that is driving me nuts.
> 
> >From the sound of it you are creating a wordlist from scratch from
> historical corpora, why not just learn today's mail into yesterday's
> wordlist.
As Matthias suggested, create a MH, maildir, or mbox for the ham and
for the spam.
As RW suggested, do an incremental update.
Combining the two techniques will cut the bogofilter runs from multiple
small runs (slow) to 2 large runs (much, much faster).
HTH,
David
    
    
More information about the bogofilter
mailing list