html comment question
David Relson
relson at osagesoftware.com
Sun Jan 19 23:42:07 CET 2003
At 05:27 PM 1/19/03, Gyepi SAM wrote:
>On Sun, Jan 19, 2003 at 05:21:01PM -0500, David Relson wrote:
> > Hi,
> >
> > My code for killing html comments is working fine, so long as I assume
> that
> > an html comment tag, i.e. "<!--", isn't broken into multiple
> > lines. Dealing with multiple lines turns out to be rather nasty. Is it
> > necessary?
>
>Given the ways of spammers, I would say yes. I have not seen the code, but
>I assumed
>you were doing something like:
>
>When you see '<!--' in html, set a flag and skip those and and all other
>characters while the
>flag is set and you have not seen a '-->', which would also be skipped and
>the flag turned off.
>
>That would fix the multiple line problem, no?
I wish :-)
When my code sees a '<', it checks whether the buffer has 3 additional
characters (for the "!--"). If not, it reads more lines until it has the 3
characters it needs. I encountered a problem with a qp line that ends
"<i>=". After qp_decode the buffer has "<i>", which is only 2 characters
after the '<', so I read another line. The line is qp_decode()'ed just
fine and the fact it's not an html comment is recognized just fine. The
problem is that yylex doesn't know that data has been added to the
buffer. I'm hesitant to just increment yyleng, but guess I'll have to dig
in, increment it, and see what breaks next. To be honest, I don't like
mucking inside system routines like yylex because of the snowball effect -
a small change begets a large change begets ...
More information about the bogofilter-dev
mailing list