[bogofilter] Improved Calculations

Tom Allison tallison at tacocat.net
Thu May 13 22:29:04 CEST 2004


David Relson wrote:

>>Right. But "run test" can be extremly expensive;-)
> 
> 
> Hi pi,
> 
> Indeed.  Bogotune's additional scanning (5 sp_esf values and 5 ns_esf
> values) has increased its workload by 25.  However if you want
> demonstrably useful numbers, you've got to do the work.  Think of it as
> a task to keep your computer out of trouble while you're sleeping.
> 
> David
> _______________________________________________
> Bogofilter mailing list
> Bogofilter at bogofilter.org
> http://www.bogofilter.org/mailman/listinfo/bogofilter
> 

Like "bogo at home" ??  :)

I tried something which was statistically illegal but provided the same 
results in much less time when I was playing with 
robx/robs/min_dev/subnet/ascii_char values.  For what it's worth, I'll 
share, but caveat with this was probably a bad thing to do and I got 
lucky and my statistics professor will hunt me down for posting this.

Keeping all variables fixed, I modified only one and found the best value.
Using that best, I then varied another variable and found the best for 
that one.
And so on.

I started with robx=(1.0, 0.1, 0.01) and chose 1.0
then did block_on_subnets and replace_nonascii_chars and picked yes for 
both.
Finally I picked robx (0.4, 0.5 0.6) and picked on 0.6.

This gave the same results as if I went through and selected the best 
set of tests results for all three sample sets across all 24 parameter 
combinations.  I ended up at the same place in much less time.

Statistically this could be highly illegal, unless these variables are 
independent of each other...

If I somehow wasn't just lucky, then the combination of tests could be 
additive and not multiplicative. (2 sets of 5 tests would increase by 
+10 and not *25 )




More information about the Bogofilter mailing list