Performance of plain text parser, one potential error in pattern..

David Relson relson at osagesoftware.com
Fri Feb 21 15:42:22 CET 2003


Nick,

I'll take a look at your lexer changes.

The backslash/bracket changes should be easy to test.

Deleting trailing periods from tokens sounds like the way it should be.  If 
we're not doing that, we should change.  If that changes the test results, 
that'll be fine.  I've come to recognize that lexer fixes/changes _always_ 
call for updating the reference results.

I did the performance tests with Greg's large files.  I can include your 
changes and see what happens.  We should be able to get your command for 
hand-building lexer_text_plain.c into the Makefile.

Having mucked with the lexer code, I think you're beginning to appreciate 
Matthias' plan for a main lexer for handling the header and mime boundaries 
and having it call secondary lexers for normal text and for html, with the 
secondary lexers being totally oblivious to header fields, mime boundaries, 
and "^From " lines.  You're quickly becoming our lexer expert :-)

David





More information about the Bogofilter mailing list