Importance of dot in TOKEN

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Fri Mar 19 13:53:19 CET 2004


David Relson wrote:

>> So while there are only pretty few test messages, there is
>> only little to observe. There is a very, very small
>> indication that . might help in avoiding fp's. The number of
>> fn's seems reduced a bit by *not* using dots. This is a
>> surprise, I expected the dot version to clearly outperform
>> the much simpler lexer. It does not. So I gonna keep it out.
> 
> As a guess, as wordlists grow and become more comprehensive, each form
> of special treatment becomes less important.  For example, we have
> header tagging and url identification.  Removing one (or the other)
> would have some effect and removing both would have a larger effect. 

Could be, but my much simplified lexer already has several
of those without any trouble (except that I don't understand
why people complain about random words;-).

> If we also removed decoding (base64, qp, etc) or multipart mime processing,
> results would change even more.

Certainly, this is a change of a different kind.

>> With this result in mind it will be interesting to see if IP
>> numbers are really useful. I'll keep you posted.
> 
> I have found IP numbers to be useful, particularly when using
> "block_on_subnets=yes".

The question will be: Does it look useful to the user or
does it actually outperform significantly in the tests? I
never used block_on_subnets, but it might be worth testing.

pi




More information about the Bogofilter mailing list