Importance of dot in TOKEN
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Fri Mar 19 13:53:19 CET 2004
David Relson wrote:
>> So while there are only pretty few test messages, there is
>> only little to observe. There is a very, very small
>> indication that . might help in avoiding fp's. The number of
>> fn's seems reduced a bit by *not* using dots. This is a
>> surprise, I expected the dot version to clearly outperform
>> the much simpler lexer. It does not. So I gonna keep it out.
>
> As a guess, as wordlists grow and become more comprehensive, each form
> of special treatment becomes less important. For example, we have
> header tagging and url identification. Removing one (or the other)
> would have some effect and removing both would have a larger effect.
Could be, but my much simplified lexer already has several
of those without any trouble (except that I don't understand
why people complain about random words;-).
> If we also removed decoding (base64, qp, etc) or multipart mime processing,
> results would change even more.
Certainly, this is a change of a different kind.
>> With this result in mind it will be interesting to see if IP
>> numbers are really useful. I'll keep you posted.
>
> I have found IP numbers to be useful, particularly when using
> "block_on_subnets=yes".
The question will be: Does it look useful to the user or
does it actually outperform significantly in the tests? I
never used block_on_subnets, but it might be worth testing.
pi
More information about the Bogofilter
mailing list