html comment question

David Relson relson at osagesoftware.com
Sun Jan 19 23:42:07 CET 2003


At 05:27 PM 1/19/03, Gyepi SAM wrote:

>On Sun, Jan 19, 2003 at 05:21:01PM -0500, David Relson wrote:
> > Hi,
> >
> > My code for killing html comments is working fine, so long as I assume 
> that
> > an html comment tag, i.e. "<!--", isn't broken into multiple
> > lines.  Dealing with multiple lines turns out to be rather nasty.  Is it
> > necessary?
>
>Given the ways of spammers, I would say yes. I have not seen the code, but 
>I assumed
>you were doing something like:
>
>When you see '<!--' in html, set a flag and skip those and and all other 
>characters while the
>flag is set and you have not seen a '-->', which would also be skipped and 
>the flag turned off.
>
>That would fix the multiple line problem, no?

I wish :-)

When my code sees a '<', it checks whether the buffer has 3 additional 
characters (for the "!--").  If not, it reads more lines until it has the 3 
characters it needs.  I encountered a problem with a qp line that ends 
"<i>=".  After qp_decode the buffer has "<i>", which is only 2 characters 
after the '<', so I read another line.  The line is qp_decode()'ed just 
fine and the fact it's not an html comment is recognized just fine.  The 
problem is that yylex doesn't know that data has been added to the 
buffer.  I'm hesitant to just increment yyleng, but guess I'll have to dig 
in, increment it, and see what breaks next.   To be honest, I don't like 
mucking inside system routines like yylex because of the snowball effect - 
a small change begets a  large change begets ...






More information about the bogofilter-dev mailing list