training is SLOW
David Relson
relson at osagesoftware.com
Sun Aug 10 19:30:59 CEST 2003
At 12:41 PM 8/10/03, Lane P. Lester wrote:
>"Rodney D. Myers" <rdmyers at pe.net> wrote:
> > Before ti would take a few hours to run through 60,000+ email, and
> > less than 1000 spam. It was still churning along after 24 hours, and
> > not done yet.
>
>It seems you're not using the re-training method I was shown:
>http://linux.oreillynet.com/lpt/a/3167
Rodney,
We need more details on what you're doing ... While thinking about your
problem, a possible cause came to mind:
The big change in 0.14.x is to use one database, i.e. wordlist.db, for
holding both spam and ham tokens. Previously, bogofilter always used two
databases - spamlist.db and goodlist.db.
With the change, we've noticed that size of BerkeleyDB's cache can have a
significant effect on system performance. If you want to experiment, try
something like:
#!/bin/sh
for cache in 4 8 12 16 ; do
rm -f wordlist.db
echo cache size: $cache
time -p bogofilter -n -d . -k $cache < test.mbx
done
where test.mbx is a mailbox with 1,000 messages. If the times are too
low, try using 10,000 messages.
Anyhow, the script should help you find a cache size that works well for
your machine.
David
More information about the Bogofilter
mailing list