effects of lexer changes

Matthias Andree matthias.andree at gmx.de
Sat Jan 4 13:17:45 CET 2003


On Fri, 03 Jan 2003, Gyepi SAM wrote:

> My plan is to tokenize everything that can be tokenized, include the
> prologue, which rfc2045 calls the preamble, and the "postamble",
> and only skip unrecognized parts.  So I think we should look at the tokens of
> multipart/* and all the recognize subparts. This has the result, for a multipart/alternative message with text and html parts, of essentially doubling the
> token count but I don't think that should be cause for concern.

We should not count the same word twice. It distorts the counts.

> FYI, I had been working with an early version of the mime parser, but stopped
> when a flurry of changes causes too many conflicts. I will continue working
> on it and will offer it up RSN.

Thanks in advance.

-- 
Matthias Andree




More information about the bogofilter-dev mailing list