HTML comment idea

David Relson relson at osagesoftware.com
Sun Jan 19 02:43:31 CET 2003


At 08:30 PM 1/18/03, Matthias Andree wrote:

>David Relson <relson at osagesoftware.com> writes:
>
> > Do people want a feature like this?  What should the defaults be?  Would
> > someone like to do the implementation?
>
>Hum, I'd think we'd drop the HTML comments and reparse the resulting
>stuff (we won't get away with a single-pass approach for HTML unless we
>add explicit token merging code which I wouldn't like -- too complex).
>
>We might try to turn the HTML lexer into a preprocessor that feeds into
>the plain text lexer.
>
>The comments aren't displayed, and although they may now be indicative
>of spam in some cases, in the long run spammers will try to put "ham"
>tests into them so to deceive filters.
>
>For the same reasons, I don't think we should do any counting. This is
>something SpamAssassin is good at, and we should not try to duplicate
>it.
>
>My vote therefore is: no options, kill HTML comments. No further
>features before 0.10; removing HTML comments is a bug fix against
>split-up tokens and qualifies for 0.10. ;-)

I'd like to include your database fixes in 0.10.  I know you're working on 
some stuff.  When do you think it'll be ready for a 0.10-beta?

As the html comment processing is coming along nicely, I'll likely also 
include whatever I have at that time.

David





More information about the Bogofilter mailing list