[bogofilter] Improved Calculations
tallison at tacocat.net
Wed May 5 21:31:36 EDT 2004
David Relson wrote:
> On Wed, 05 May 2004 18:25:20 -0400
> Tom Allison wrote:
>>I see this getting really ugly really fast.
>>How does spam_cutoff/ham_cutoff and sp_esf/ns_esf inter-relate to each
> In some ways this is definitely so. In its first scan bogotune uses 5
> values of robs, 5 of robx, and 9 min_dev for a total of 225
> combinations. Adding 5 for sp_esf and 5 for ns_esf increase the count
> by a factor of 25, i.e. to 5625. So the compehensive scan becomes time
> spam_cutoff and ham_cutoff are pretty much separate from one another,
> except for the obvious -- ham_cutoff must be less than or equal to
> spam_cutoff. The two sp_esf and ns_esf factors are separate from one
> another, and separate from robs, robx, and min_dev. However, as we've
> learned, the various factors interact with one another in complex
> non-obvious ways, which means that we can't arbitrarily change one and
> still expect the best performance. A parameter tester, like bogotune,
> is needed to test the combinations and determine which combo works best
> for the data (messages) being tested.
> Initially, at least, bogofilter will just use 1.0 for both sp_esf and
> ns_esf. That will give the same answers as not using ESF. Gary
> Robinson has shown that this new idea has merit, and Greg Louis has
> confirmed the merit. Somewhere down the road, as experiments are run
> and we learn more, other values are likely to be values.
>>I got through part of the article this morning but haven't had a
>>chance to complete it. I'm not very good at statistics. At least,
>>not that good.
> I'm not a statistician either. I took an introductory statistics course
> in college in the '60's, have had little need since then, and have
> forgotten virtually all of it. Fortunately there are others, like Gary
> Robinson and Greg Louis, who have the knowledge and the skill to apply
> it to the problem at hand -- identifying spam.
> Hope this helps!
It does, on several levels.
I'll help run tests...
I can do that!
More information about the Bogofilter