HTML parsing

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Wed Nov 26 14:29:41 CET 2003


David Relson wrote:

> < given the left angle bracket in this line, an html parser would think
> it's an html tag.  Since bogofilter ignores the innards of invalid html
> tags, this is another non-message.

Right, a bad one.

> The lexer size would decrease by a small amount.  The DOCTYPE rule would
> go away, but most everything else would still be needed.  I tried the
> experiment with the current lexer (lexer_v3.l.1.125).  Here are the
> numbers:

Thanks for testing.

> P.S.  There's no need to CC me on messages.  It forces me to delete the
> duplicate copy.

I double-checked my sent folder and don't see that I did.

pi




More information about the Bogofilter mailing list