lexer mod

Tue Sep 9 23:27:09 CEST 2003

On Tue, 09 Sep 2003 23:21:08 +0200
Matthias Andree <matthias.andree at gmx.de> wrote:

> David Relson <relson at osagesoftware.com> writes:
> 
> > The patch below does the trick.  Not surprisingly it affects "make
> > check".  If you're sure we want it, I'll add it and update the
> > reference results so "make check" will be happy.
> 
> I'm not convinced it improves accuracy, but I'm convinced it keeps
> unique tokens out of the data base, and it fits into the scheme of
> discarding the Message-ID (likely we'd also need to scrap Outlook's
> Thread-* stuff) and queue IDs from headers.
> 
> > diff -u -r1.55 token.c
> > --- token.c	6 Sep 2003 20:50:39 -0000	1.55
> > +++ token.c	9 Sep 2003 02:50:01 -0000
> > @@ -77,6 +77,7 @@
> >  	cls = lexer->yylex();
> >  	yylval->leng = *lexer->yyleng;
> >  	yylval->text = (unsigned char *)(*lexer->yytext);
> > +	yylval->text[yylval->leng] = '\0';
> >  
> >  	if (DEBUG_TEXT(2)) { 
> >  	    word_puts(yylval, 0, dbgout);
> 
> Do we need this diff? word_puts handles it.

It's probably not 100% necessary, but it makes debugging easier.  I can
use the Z macro which (theoretically, i.e. not yet tested) will generate
nothing if debug swithches are set properly.