Minimal rules [was: Test with different lexers]

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Tue Dec 2 18:21:40 CET 2003


David Relson wrote:

> If you're really intent on simplifying the lexer, reduce the rules to a
> single rule which uses only whitespace for delimiters.  That would
> indicate what happens without any of the current special rules for
> processing html, ignoring binary attachments, tagging header lines.  I'd
> be interested in hearing how much smaller it is, how much faster it is,
> and what its scoring performance is.

That sound like an approach really going way too far. I
cannot see why you would not do decoding, since that is
something just hidden to the recipient. The idea is after
all to look at the text the reader will see. And clearly
headers are seperated from the body, so this should be
reflected.

Simplification does not mean to be dumb.

pi




More information about the Bogofilter mailing list