0.92.6's bogotune much slower?
David Relson
relson at osagesoftware.com
Fri Sep 3 01:27:45 CEST 2004
On Wed, 01 Sep 2004 14:25:42 +0200
Valient Gough wrote:
>
> Summary: bogotune has had 28 hours of cpu time so far on a 1.7Ghz
> machine, taking 120Mb of memory (no swapping). The process is CPU
> bound and only looks to be about half finished.
>
> First, let me say thanks for bogofilter - I've been using it for a
> long time and it has been great.
>
> I just upgraded from 0.92.0 to 0.92.6 and am seeing performance
> problems with bogotune. I have a couple thousand ham message and
> approx 22 thousand spam messages in maildir folders (10:1 spam:ham is
> why I'm glad to have the combination of bogofilter and spamassassin
> for email).
>
> The wordlist file is 35Mb in size. I've set the db_cachesize to 32Mb,
> but that doesn't seem to have had any effect.
>
> I ran bogotune under 0.92.0 to get my current settings a few months
> ago, and that took on the order of 2 hours for the run. I decided to
> run again since I have a few thousand more messages to try and
> optimize against.
>
> Has bogotune become much slower in recent releases? Is there anything
> I can do to make it faster?
>
> regards,
> Valient Gough
Hello Valient,
Bogotune used to do its grid search with 3 parameters - robs, robx, and
min_dev. The first (coarse) pass checked 5 robs values, 5 robx values,
and 9 min_dev values for a total of 225 combinations. The second (fine)
pass typically checks 105 values (3*5*7).
With the addition of ESF (effective size factor), there are two
additional parameters to check - ns_esf (for non-spam) and sp_esf (for
spam). With 7 values for each one, the total combinations is higher.
For the coarse scan there are 3675 passes (rs: 3, rx: 5, md: 5 spex: 7,
nsex: 7) and there will be a comparable number in the fine scan (though
it may vary a bit).
Given the _much_ larger number of parameter combinations to check,
bogotune _is_ slower.
If you want to skip the ESF tests, run bogotune with the "-E" flag. I
usually use "-vv" when I run it as that shows a reasonable level of
intermediate results and lets me see that bogotune is chugging along.
Another factor that affects speed is the number of messages used for
tuning. More messages give a better result and also take longer.
HTH,
David
P.S. I recommend subscribing to the mailing list (via message to
bogofilter-subscribe at bogofilter.org). Messages from non-subscribers
typically get approved once a day (unless you get lucky).
More information about the Bogofilter
mailing list