lexer mod [was: singletons]

David Relson relson at osagesoftware.com
Tue Sep 9 04:53:33 CEST 2003


On Tue, 09 Sep 2003 00:30:26 +0200
Matthias Andree <matthias.andree at gmx.de> wrote:

> David Relson <relson at osagesoftware.com> writes:
> 
> > The lexer presently rules include the following:
> >
> > <INITIAL>^Message-ID:.*			;
> 
> That should be
> <INITIAL>^(Resent-)?Message-ID:.*		;
> 
> > <INITIAL>^(Delivery-)?Date:.*			;
> > <INITIAL>(ESMTP|SMTP)+[ \t\n]+id\ {ID}	;
> > <INITIAL>[:blank:]*id\ {ID}			;
> >
> > The second to last has been changed (fixed) recently and the last is
> > new.
> 
> Is it safe to do something like:
> 
> <INITIAL>^(In-Reply-To|References):.* { yyleng = memchr(yytext, ':',
> yyleng) - yytext - 1; return TOKEN; }
> 
> or will lexer then read the remainder of the line as well because I
> reduced yyleng?

Matthias,

The patch below does the trick.  Not surprisingly it affects "make
check".  If you're sure we want it, I'll add it and update the reference
results so "make check" will be happy.

David

Index: lexer_v3.l
===================================================================
RCS file: /cvsroot/bogofilter/bogofilter/src/lexer_v3.l,v
retrieving revision 1.79
diff -u -r1.79 lexer_v3.l
--- lexer_v3.l	7 Sep 2003 01:10:25 -0000	1.79
+++ lexer_v3.l	9 Sep 2003 02:50:00 -0000
@@ -217,8 +217,10 @@
 <INITIAL>^Content-(Transfer-Encoding|Type|Disposition):{MTYPE}	{ mime_content(yy_text()); skip_to(':'); return TOKEN; }
 <INITIAL>^MIME-Version:.*			{ mime_version(yy_text()); 	skip_to(':'); return
TOKEN; }
 
-<INITIAL>^Message-ID:.*				;
+<INITIAL>^(Resent-)?Message-ID:.*		;
 <INITIAL>^(Delivery-)?Date:.*			;
+
+<INITIAL>^(In-Reply-To|References):.* 		{ yyleng = index(yytext, ':') -
yytext; return TOKEN; }
 
 <INITIAL>boundary=[ ]*\"?{MIME_BOUNDARY}\"?	{ mime_boundary_set(yy_text()); }
 <INITIAL>charset=\"?{CHARSET}\"?		{ got_charset(yytext); 		skip_to('='); return TOKEN; }
Index: token.c
===================================================================
RCS file: /cvsroot/bogofilter/bogofilter/src/token.c,v
retrieving revision 1.55
diff -u -r1.55 token.c
--- token.c	6 Sep 2003 20:50:39 -0000	1.55
+++ token.c	9 Sep 2003 02:50:01 -0000
@@ -77,6 +77,7 @@
 	cls = lexer->yylex();
 	yylval->leng = *lexer->yyleng;
 	yylval->text = (unsigned char *)(*lexer->yytext);
+	yylval->text[yylval->leng] = '\0';
 
 	if (DEBUG_TEXT(2)) { 
 	    word_puts(yylval, 0, dbgout);




More information about the bogofilter-dev mailing list