[bogofilter] Improved Calculations

Tom Allison tallison at tacocat.net
Wed May 5 21:31:36 EDT 2004


David Relson wrote:
> On Wed, 05 May 2004 18:25:20 -0400
> Tom Allison wrote:
> 
> ...[snip]...
> 
> 
>>I see this getting really ugly really fast.
>>How does spam_cutoff/ham_cutoff and sp_esf/ns_esf inter-relate to each
>>
>>other?
> 
> 
> In some ways this is definitely so.  In its first scan bogotune uses 5
> values of robs, 5 of robx, and 9 min_dev for a total of 225
> combinations.  Adding 5 for sp_esf and 5 for ns_esf increase the count
> by a factor of 25, i.e. to 5625.  So the compehensive scan becomes time
> consuming.
> 
> spam_cutoff and ham_cutoff are pretty much separate from one another,
> except for the obvious -- ham_cutoff must be less than or equal to
> spam_cutoff.  The two sp_esf and ns_esf factors are separate from one
> another, and separate from robs, robx, and min_dev.  However, as we've
> learned, the various factors interact with one another in complex
> non-obvious ways, which means that we can't arbitrarily change one and
> still expect the best performance.  A parameter tester, like bogotune,
> is needed to test the combinations and determine which combo works best
> for the data (messages) being tested.
> 
> Initially, at least, bogofilter will just use 1.0 for both sp_esf and
> ns_esf.  That will give the same answers as not using ESF.  Gary
> Robinson has shown that this new idea has merit, and Greg Louis has
> confirmed the merit.  Somewhere down the road, as experiments are run
> and we learn more, other values are likely to be values.  
> 
> 
>>I got through part of the article this morning but haven't had a
>>chance to complete it.  I'm not very good at statistics.  At least,
>>not that good.
> 
> 
> I'm not a statistician either.  I took an introductory statistics course
> in college in the '60's, have had little need since then, and have
> forgotten virtually all of it.  Fortunately there are others, like Gary
> Robinson and Greg Louis, who have the knowledge and the skill to apply
> it to the problem at hand -- identifying spam.
> 
> Hope this helps!
> 
> David
> 

It does, on several levels.

I'll help run tests...
I can do that!


More information about the Bogofilter mailing list