0.92.6's bogotune much slower?

Valient Gough vgough at pobox.com
Sun Sep 5 14:36:13 CEST 2004


On Sun, 2004-09-05 at 14:13, David Relson wrote:

> On Sun, 05 Sep 2004 10:55:55 +0200
> Valient Gough wrote:



> > I just started a run with ESF disabled, and I notice that instead of
> > using the ESF parameters that are in the configuration file, disabling
> > ESF instead forces spesf and nsesf to 1.0.
> 
> You're right.  Having bogotune use the already known ESF values _would_
> be faster.  That's the up side.  On the down side, with modified tokens
> in the wordlist and with a different set of messages used in tuning, the
> old ESF values may no longer be optimal.  So running bogotune with old
> ESF values is not ideal.  Of course, using the default ESF values of 1.0
> isn't ideal either.


Right, but I may want to retrain from time to time, but not be willing
to commit a week's worth of CPU time.  So if the existing ESF values
will work, then I'd rather use them then the defaults.  Either that or
allow a small search space around the existing values -- but not if it
will cost days of CPU time!  This is starting to sounds like a parameter
transfer problem between related but not identical datasets..


> 
> In normal operation, since bogotune is looking for a fresh, new set of
> parameters, it doesn't use the old parameter settings, hence doesn't
> read a config file.  The good news is that bogotune is able to read a
> config file and will use the old ESF values.  Run "bogotune -?" to
> display the help message (which includes the option for reading the
> config file).  Then run bogotune with config file and "-E" and it'll do
> as you want.


Perhaps this has been fixed already, but in 0.92.6 it does not behave
this way.  It ignored the config values even though I told it to read a
config file, which is why I sent the previous mail suggesting that it
use the existing values for untuned parameters.  After I modified the
init_course and init_fine methods to use the existing values, then it
worked how I expected (and how you describe).

One more thing:  after my latest run with -E (and bogotune modified to
use existing ESF values), the result seems to me better then the full
run which included ESF tuning.  The dataset was nearly identical (except
for the extra spam messages received over the course of 5 days while the
full run was grinding away).  

Some results from the full run:
spam_cutoff=0.986189    # for 0.10% fp (1); expect 3.61% fn (824).
#spam_cutoff=0.981340   # for 0.20% fp (2); expect 2.17% fn (497).

And results from the partial run (-E), using the ESF values calculated
from the full run above:
spam_cutoff=0.990644    # for 0.10% fp (1); expect 4.86% fn (1143).
#spam_cutoff=0.970202   # for 0.20% fp (2); expect 0.80% fn (187).

So the search solution doesn't seem particularly stable, which is why I
expect to have to run bogotune periodically to adjust parameters (and
why I'd rather not have a 5-day run each time)...

regards,
Valient

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040905/bf0eb2f5/attachment.html>


More information about the Bogofilter mailing list