What about html_reorder ?

Wed May 12 18:15:59 CEST 2004

Hi all,

  I am working on getting the most out of the bogofilter lexer
speed wise. The less malloc, the less parsing, the better :)
  That's why this line confuses me:

  <HTML>{TOKEN_12}({HTMLTOKEN})+/{NOTWHITESPACE}    { html_reorder();
return TOKEN;}

And the code after html_reorder that malloc, swap memory and call yyunput.
I do all the mime parsing before sending each decoded mime part to the
lexer, setting
the initial state myself. For example, for an HTML part, the lexer is
called with initial
state HTML and the buffer is the HTML part itself.

That sometimes give me a nice bug otherwise not seen:
*flex* scanner push-back *overflow

*Which means that the unput went too far and stepped outside of the
buffer. That didn't throw
an error before (when the parser handled the whole email) because there
was some data before
the HTML part but that doesn't mean that the bug didn't exist, it just
didn't crash :)).

To solve two problems in a row (Yeah, I am that kind of person), what
about getting rid of
this all html_reorder thing ?

Isn't that enough:
  <HTML>{TOKEN_12}({HTMLTOKEN})+/{NOTWHITESPACE}    { return
HTMLREORDERTOKEN; }

And in the C code (token.c), in the big switch:
case HTMLREORDERTOKEN:
       real length of token = position of '<'

Like it's done for HEADKEY.

So, what do you guys think ?

Giorgio