[bogofilter] Improved Calculations
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Thu May 13 13:46:26 CEST 2004
David Relson wrote:
>> > I read those. I have to say, that -- without knowing the
>> > theory -- this is something I really don't understand. The
>> > new parameters may have some intuition, but how they are
>> > used is magic. That makes it hard to guess good values, as
>> > the article explains, only testing seems to work.
>>
>> IMHO, at least the ham parameter should be computable from
>> the informational entropy of english text.
>>
>> And presumably the spam parameter would be the computable
>> by a similar mean on a sufficent quantity of spam text.
>>
>> But the actual mapping from the entropy values to the ESF value
>> escapes me.
>
> I've not yet found the actual mapping from wordlist to robs, robx,
> min_dev, etc.
OK, but there is a clear intuition (which of course is a
dangerous thing;-). So we understand the consequences of
modifying the value.
> We've got a scanning tool in bogotune that will
> empirically find an answer.
Not for train-on-error.
> Wouldn't it be nice to have a mathematical formula?
It would.
> Finding the ESF values is no harder and no easier.
In train-on-error the values are built into the database (in
a subtle way, but they are in there). So I can easily shift
robx and the cutoffs to values I like. robs and min_dev are
harder, but that seems to be OK. For the new parameters I
don't even have an estimate.
pi
More information about the Bogofilter
mailing list