0.92.6's bogotune much slower?

David Relson relson at osagesoftware.com
Fri Sep 3 01:27:45 CEST 2004


On Wed, 01 Sep 2004 14:25:42 +0200
Valient Gough wrote:

> 
> Summary: bogotune has had 28 hours of cpu time so far on a 1.7Ghz
> machine, taking 120Mb of memory (no swapping).  The process is CPU
> bound and only looks to be about half finished.  
> 
> First, let me say thanks for bogofilter - I've been using it for a
> long time and it has been great.
> 
> I just upgraded from 0.92.0 to 0.92.6 and am seeing performance
> problems with bogotune.  I have a couple thousand ham message and
> approx 22 thousand spam messages in maildir folders (10:1 spam:ham is
> why I'm glad to have the combination of bogofilter and spamassassin
> for email).
> 
> The wordlist file is 35Mb in size.  I've set the db_cachesize to 32Mb,
> but that doesn't seem to have had any effect.
> 
> I ran bogotune under 0.92.0 to get my current settings a few months
> ago, and that took on the order of 2 hours for the run.  I decided to
> run again since I have a few thousand more messages to try and
> optimize against.
> 
> Has bogotune become much slower in recent releases?  Is there anything
> I can do to make it faster?
> 
> regards,
> Valient Gough

Hello Valient,

Bogotune used to do its grid search with 3 parameters - robs, robx, and
min_dev.  The first (coarse) pass checked 5 robs values, 5 robx values,
and 9 min_dev values for a total of 225 combinations.  The second (fine)
pass typically checks 105 values (3*5*7).

With the addition of ESF (effective size factor), there are two
additional parameters to check - ns_esf (for non-spam) and sp_esf (for
spam).  With 7 values for each one, the total combinations is higher.
For the coarse scan there are 3675 passes (rs: 3, rx: 5, md: 5 spex: 7,
nsex: 7) and there will be a comparable number in the fine scan (though
it may vary a bit).

Given the _much_ larger number of parameter combinations to check,
bogotune _is_ slower.

If you want to skip the ESF tests, run bogotune with the "-E" flag.  I
usually use "-vv" when I run it as that shows a reasonable level of
intermediate results and lets me see that bogotune is chugging along.

Another factor that affects speed is the number of messages used for
tuning.  More messages give a better result and also take longer.

HTH,

David

P.S.  I recommend subscribing to the mailing list (via message to
bogofilter-subscribe at bogofilter.org).  Messages from non-subscribers
typically get approved once a day (unless you get lucky).



More information about the Bogofilter mailing list