lexer speedup.
michael at optusnet.com.au
michael at optusnet.com.au
Sun Jul 20 03:34:03 CEST 2003
While I'm playing with the lexer:
Variable length look-ahead has a noticable
speed penalty in flex. This patch removes
a variable-length look-ahead and speeds
up discarding loose comments.
No functionality change. Just speedups.
(Good for about 8% lower CPU usage for me).
Michael.
--- old.l Sun Jul 20 10:38:31 2003
+++ lexer_v3.l Sun Jul 20 11:30:17 2003
@@ -220,9 +220,14 @@
/* This has to match just as much or more than the below rules, so as to be the
controlling rule. */
-<HTML>{TOKEN}/{HTMLTOKEN}*{BREAKHTML}+{HTMLTOKEN}*.? { return TOKEN; }
+<HTML>{TOKEN}{HTMLTOKEN}*{BREAKHTML}+{HTMLTOKEN}*.? |
-<HTML>{TOKEN}/({HTMLTOKEN})+{WHITESPACE} { return TOKEN; }
+<HTML>{TOKEN}({HTMLTOKEN})+{WHITESPACE} {
+ char *chr = memchr(yytext, '<', yyleng); /* find start of html tag */
+ size_t len = chr - yytext;
+ yyless(len);
+ return TOKEN;
+ }
<HTML>{TOKEN}({HTMLTOKEN})+/{NOTWHITESPACE} {
reorder_html();
@@ -242,12 +247,11 @@
<HTML>"<" { /* unknown tag */ BEGIN HDISCARD; }
<HTOKEN>{TOKEN} { if (tokenize_html_tags) return TOKEN; }
-<HDISCARD>{TOKEN} { /* discard innards of html tags */ }
-<SCOMMENT,LCOMMENT>{TOKEN} { /* discard innards of html comments */ }
+<HDISCARD,LCOMMENT>[^>]*> { /* discard innards of html tags */ BEGIN HTML;}
<HSCRIPT>{TOKEN} { if (tokenize_html_script) return TOKEN; }
<HTOKEN,HDISCARD>">" { BEGIN HTML; } /* end of tag; return to normal html processing */
-<LCOMMENT>">" { BEGIN HTML; } /* end of loose comment; return to normal html processing */
+<SCOMMENT>{TOKEN} { /* discard innards of html comments */ }
<SCOMMENT>"-->" { BEGIN HTML; } /* end of strict comment; return to normal html processing */
{IPADDR} { return IPADDR;}
More information about the Bogofilter
mailing list