html comment processing

David Relson relson at osagesoftware.com
Mon Mar 31 20:57:53 CEST 2003


Herman,

I don't think it's quite as simple as you say..

At 01:43 PM 3/31/03, Herman Oosthuysen wrote:

><br>one tw<!--this is a comment-->o three
>This comment is "--this is a comment--"
>
>
><br><!first>
>This comment is "first"

The spec says "Each comment starts with `--' and includes all text up to 
and including the next occurrence of `--'".

With the hyphen pairs before and after the comment, the comment can include 
almost anything.  Specifically it can include angle brackets, but not "--".

><!-->
>This comment is "--"
>
>third<-->
>This is an illegal tag and should be discarded

Given the quote of the spec above, "<!--second-->" has comment "second" as 
that's what's between the pairs of hyphens  and  "<!-->third<-->" has 
comment ">third<". Also, "--" can't be a comment and "<-->" would be 
invalid html, except that in this case it's the end of a comment.

>I have read many horrible specs in various languages over the years. The 
>HTML spec is quite clear to me in that "<!" and ">" are the comment 
>delimeters.  I do not think that there is any ambiguity about it.

I think we've demonstrated that there are at least two interpretations.

David





More information about the Bogofilter mailing list