Performance of plain text parser, one potential error in pattern..
David Relson
relson at osagesoftware.com
Fri Feb 21 15:42:22 CET 2003
Nick,
I'll take a look at your lexer changes.
The backslash/bracket changes should be easy to test.
Deleting trailing periods from tokens sounds like the way it should be. If
we're not doing that, we should change. If that changes the test results,
that'll be fine. I've come to recognize that lexer fixes/changes _always_
call for updating the reference results.
I did the performance tests with Greg's large files. I can include your
changes and see what happens. We should be able to get your command for
hand-building lexer_text_plain.c into the Makefile.
Having mucked with the lexer code, I think you're beginning to appreciate
Matthias' plan for a main lexer for handling the header and mime boundaries
and having it call secondary lexers for normal text and for html, with the
secondary lexers being totally oblivious to header fields, mime boundaries,
and "^From " lines. You're quickly becoming our lexer expert :-)
David
More information about the Bogofilter
mailing list