html comment processing

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Tue Apr 1 08:59:48 CEST 2003


Herman Oosthuysen <Herman at WirelessNetworksInc.com> wrote:

>Well, it is interesting that Quanta+ interprets the whole line:
>
><br><!first> <!--second--> <!-->third<-->
>
>as comments and displays none of it, while Mozilla shows it as:
>
>third<-->
>
>So, Mozilla at least partially agrees with me...

There has been a long discussion (and several bug reports on
bugzilla.mozilla.org) on this issue. IIRC Mozilla behaves
differently depending on the DOCTYPE. But anyhow here we are
not trying to build some standard compliant HTML parser. In
fact, spam will be Internet Exploder compliant in the first
place (and that means severely broken HTML parsing).

>>> As from some offline discussions, note this typedef tag at the start 
>>> of every HTML doc:
>>> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 Strict//EN">
>>>
>>> Bogofilter should discard the above typedef construct as a comment.

I think we should include that.

pi




More information about the Bogofilter mailing list