[bogofilter] Improved Calculations

michael at optusnet.com.au michael at optusnet.com.au
Thu May 13 04:33:57 CEST 2004


"Boris 'pi' Piwinger" <3.14 at logic.univie.ac.at> writes:
> David Relson wrote:
> 
> > Gary Robinson's blog has a new "Improved Chi" article (at
> > http://www.garyrobinson.net/2004/04/improved_chi.html) which points to
> > his new paper, "Handling Redundancy in Email Token Probabilities" (at
> > http://garyrob.blogs.com//handlingtokenredundancy93.pdf).  
> 
> I read those. I have to say, that -- without knowing the
> theory -- this is something I really don't understand. The
> new parameters may have some intuition, but how they are
> used is magic. That makes it hard to guess good values, as
> the article explains, only testing seems to work.

IMHO, at least the ham parameter should be computable from
the informational entropy of english text.

And presumably the spam parameter would be the computable
by a similar mean on a sufficent quantity of spam text.

But the actual mapping from the entropy values to the ESF value
escapes me.

Michael.



More information about the Bogofilter mailing list