What to do for HTML comment processing ???

Tim Freeman tim at fungible.com
Tue Mar 4 05:30:16 CET 2003


From: Matthias Andree <matthias.andree at gmx.de>
>David Relson <relson at osagesoftware.com> writes:
>
>> Should bogofilter simply forget that dashes are in the html standard and
>> treat "<!whatever is inside the angle brackets>" as a comment?
>
>It's unprintable in any case in the structured languages (HTML, SGML,
>XML), so just kill it.
>
>> Probably the lexer can be written so that proper comments (with
>> leading/trailing dashes) are recognized as well as comments without any
>> dashes.  The proper treatment of "<!-- proper start, improper end>"
>> remains a question.
>
>Kill it.

Hi.  I'm new here.

I agree with Matthias.  About 25% of my saved spams that have HTML
comments have them in the format <!yuck>.  I have yet to see any
examples in the wild where throwing out comments that lack the hyphens
causes a problem.

-- 
Tim Freeman       
tim at fungible.com
GPG public key fingerprint ECDF 46F8 3B80 BB9E 575D  7180 76DF FE00 34B1 5C78 




More information about the Bogofilter mailing list