Filters That Fight Back
Jason Rennie
jrennie at ai.mit.edu
Wed Sep 3 14:43:41 CEST 2003
jef at acme.com said:
> Eventually procmail will start to time-out and I get a big mess in my
> inbox. So, instead I use xargs and -B so that the mass registration
> gets broken up into batches and incoming mail gets a chance to run.
Might be worth building the new database somewhere else (e.g.
/tmp/.bogofilter) and overwriting the old database once the new one is
finished. Though, if you have no control over when e-mail is
incorporated this would be a bit tricky since you'd need to obtain a lock
on the old database before overwriting it...
relson at osagesoftware.com said:
> The problem with registering lots of messages from an MH or Maildir
> was that bogofilter updated the wordlist for each input file. That
> was slow. When you test the 0.15.0 code I think you'll find that it's
> comparably fast for mboxs and Maildirs.
I can vouch for this statement. Training with pre-0.15 took me hours. Now
it takes a minute or two. I have about 2000 spam and 8000 ham in MH
folders.
Here's a script that automates the process of bogofilter training for MH
folders (requires the soon-to-be-released 0.15.1 :)
http://www.ai.mit.edu/~jrennie/mail/bogoTrain-0.2.perl
Jason
More information about the Bogofilter
mailing list