effects of lexer changes
David Relson
relson at osagesoftware.com
Tue Dec 31 15:53:26 CET 2002
Matthias,
I've been looking at the regression tests to understand why they're
failing. So far I've identified several changes in lexer.l that cause
output to be different. FWIW, I'm testing with 0.9.1.2 and the current
mime processing lexer. The comments below apply to bogolexer and message
tests/t.systest.d/inputs/msg.2.txt.
1 - changing MAXTOKENLEN from 20 to 30 adds some tokens, for example
www.genuinerewards.com. This is fine.
2 - BOUNDARY tokens are no longer returned by get_token(). Fine.
3 - lexer.l is caseless, so "All" matches pattern "all" and disappears from
the output. Fine.
These are all correct.
4 - tokens from mime directives are not being output. For example,
"Content-Type: text/plain" used to return 3 tokens. Looks like yyredo()
isn't working. I'll fix it.
5 - The new code to "ignore anything when not reading text MIME types" may
be over zealous. In msg.2.txt, it causes the text between the message
header and the first boundary line to be ignored. Since there _is_ text
there, I think we want bogofilter to see it. What do you think?
The code to ignore tokens shouldn't take effect until bogofilter is in the
body of a mime part. I think there's a problem of stack level confusion,
i.e. the level for setting mime info and the level for using it.
Command "bogolexere -p -x lm -vvv < tests/t.systest.d/inputs/msg.2.txt"
gives a good view of what's happening.
David
More information about the bogofilter-dev
mailing list