lexer speedup.

michael at optusnet.com.au michael at optusnet.com.au
Sun Jul 20 03:34:03 CEST 2003


While I'm playing with the lexer:
Variable length look-ahead has a noticable
speed penalty in flex. This patch removes
a variable-length look-ahead and speeds
up discarding loose comments.

No functionality change. Just speedups.

(Good for about 8% lower CPU usage for me).

Michael.

--- old.l	Sun Jul 20 10:38:31 2003
+++ lexer_v3.l	Sun Jul 20 11:30:17 2003
@@ -220,9 +220,14 @@
 
   /* This has to match just as much or more than the below rules, so as to be the 
      controlling rule. */
-<HTML>{TOKEN}/{HTMLTOKEN}*{BREAKHTML}+{HTMLTOKEN}*.?	{ return TOKEN; }
+<HTML>{TOKEN}{HTMLTOKEN}*{BREAKHTML}+{HTMLTOKEN}*.?	|
 
-<HTML>{TOKEN}/({HTMLTOKEN})+{WHITESPACE}		{ return TOKEN; }
+<HTML>{TOKEN}({HTMLTOKEN})+{WHITESPACE}		{ 
+    			char *chr = memchr(yytext, '<', yyleng);	/* find start of html tag */
+			size_t len = chr - yytext;
+			yyless(len);
+			return TOKEN;
+			}
 
 <HTML>{TOKEN}({HTMLTOKEN})+/{NOTWHITESPACE} 	{
 						    reorder_html();
@@ -242,12 +247,11 @@
 <HTML>"<"					{ /* unknown tag */ BEGIN HDISCARD; }
 
 <HTOKEN>{TOKEN}					{ if (tokenize_html_tags)     return TOKEN; }
-<HDISCARD>{TOKEN}				{ /* discard innards of html tags     */ }
-<SCOMMENT,LCOMMENT>{TOKEN}			{ /* discard innards of html comments */ }
+<HDISCARD,LCOMMENT>[^>]*>			{ /* discard innards of html tags     */ BEGIN HTML;}
 <HSCRIPT>{TOKEN}				{ if (tokenize_html_script)   return TOKEN; }
 
 <HTOKEN,HDISCARD>">"				{ BEGIN HTML; }	/* end of tag; return to normal html processing */
-<LCOMMENT>">"					{ BEGIN HTML; }	/* end of loose comment; return to normal html processing */
+<SCOMMENT>{TOKEN}				{ /* discard innards of html comments */ }
 <SCOMMENT>"-->"					{ BEGIN HTML; }	/* end of strict comment; return to normal html processing */
 
 {IPADDR}					{ return IPADDR;}




More information about the Bogofilter mailing list