[bogofilter] Improved Calculations

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Wed May 12 14:05:28 CEST 2004


David Relson wrote:

> Gary Robinson's blog has a new "Improved Chi" article (at
> http://www.garyrobinson.net/2004/04/improved_chi.html) which points to
> his new paper, "Handling Redundancy in Email Token Probabilities" (at
> http://garyrob.blogs.com//handlingtokenredundancy93.pdf).  

I read those. I have to say, that -- without knowing the
theory -- this is something I really don't understand. The
new parameters may have some intuition, but how they are
used is magic. That makes it hard to guess good values, as
the article explains, only testing seems to work.

All this makes it almost impossible to understand what these
parameters do in train-on-error. Using bogotune is not an
option as this is made to reject this setting; also it would
not really do what is needed here (namely building a new
database for every single combination of parameters).

Maybe someone has some intuition how these parameters might
be useful in train-on-error. I would like to hear it.

> Greg Louis has tested the technique and found it valuable.  His writeup
> "Token Redundancy and the Effective Size Factor" can be found at
> http://www.bgl.nu/bogofilter/esf.html

As far as I understand the result table, it does not give
information on two-state mode. That would be interesting (I
assume unsures are not counted as error there).

pi



More information about the Bogofilter mailing list