html comment processing

Herman Oosthuysen Herman at WirelessNetworksInc.com
Tue Apr 1 01:51:31 CEST 2003


Well, it is interesting that Quanta+ interprets the whole line:

<br><!first> <!--second--> <!-->third<-->

as comments and displays none of it, while Mozilla shows it as:

third<-->

So, Mozilla at least partially agrees with me...

Does anybody care to try IE and Opera?


David Relson wrote:
> At 06:06 PM 3/31/03, Herman Oosthuysen wrote:
> 
>>>> > > <!-->third<-->
>>>> >
>>>> > Again "<!-->" is a comment declaration with data characters inside.
>>>> > "third" is part of the text. It needs to be counted.
>>>>
>>>> *sign*
>>>>
>>>> I knew this was going to happen.
>>>> ">third<" is the comment. That is one valid comment declaration tag.
>>>
>>>
>>> This may be bad html.  Better form would be to escape the inner angle 
>>> brackets, i.e.
>>> <!-->third<-->
>>
>> Yep, if you want "--<third>--" to be a comment, then you have to 
>> escape the angle brackets to "-->third<--".
>>
>> If "<!-->third<-->" is to be parsed litterally, then
>> "<!-->" is the comment "--",
>> "third" is text and
>> "<--> is an illegal tag.
> 
> 
> I beg to differ.  Comments call for two pairs of hyphens.  "<!-->" is 
> nothing described by the spec.
> 
> I _will_ concede that the "...third..." is not valid html.  However 
> bogofilter should do something reasonable with it.  At the moment, using 
> strict_check=no it finds one token, i.e. the word "third".
> 
>> As from some offline discussions, note this typedef tag at the start 
>> of every HTML doc:
>> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 Strict//EN">
>>
>> Bogofilter should discard the above typedef construct as a comment.
>>
>> The three characters "<>&", must be escaped in normal text, since they 
>> have a special meaning in HTML, so if you ever see those three in a 
>> document, then they are part of tags/escape sequences.
>>
>>
>> ---------------------------------------------------------------------
>> FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
>> To unsubscribe, e-mail: bogofilter-unsubscribe at aotto.com
>> For summary digest subscription: bogofilter-digest-subscribe at aotto.com
>> For more commands, e-mail: bogofilter-help at aotto.com
> 





More information about the Bogofilter mailing list