HTML comment idea

Matthias Andree matthias.andree at gmx.de
Sun Jan 19 02:30:49 CET 2003


David Relson <relson at osagesoftware.com> writes:

> Do people want a feature like this?  What should the defaults be?  Would
> someone like to do the implementation?

Hum, I'd think we'd drop the HTML comments and reparse the resulting
stuff (we won't get away with a single-pass approach for HTML unless we
add explicit token merging code which I wouldn't like -- too complex).

We might try to turn the HTML lexer into a preprocessor that feeds into
the plain text lexer.

The comments aren't displayed, and although they may now be indicative
of spam in some cases, in the long run spammers will try to put "ham"
tests into them so to deceive filters.

For the same reasons, I don't think we should do any counting. This is
something SpamAssassin is good at, and we should not try to duplicate
it.

My vote therefore is: no options, kill HTML comments. No further
features before 0.10; removing HTML comments is a bug fix against
split-up tokens and qualifies for 0.10. ;-)

-- 
Matthias Andree




More information about the Bogofilter mailing list