From [was: Performance issues....and ugly news.]
David Relson
relson at osagesoftware.com
Mon Feb 24 03:19:25 CET 2003
At 08:53 PM 2/23/03, Nick Simicich wrote:
Nick,
If you haven't resolved anything by morning, I'll take a crack at your
code. Right now, I'm at the end of my day and not alert enough to tackle
something new and different that I expect to be intricate and complicated.
>I think I am at least a few hours, with any luck, from getting this to
>work. I had a version that failed t.lexer, but I think that it failed
>t.lexer because it switched out the first buffer and switched it right
>back in and missed the "From " token at the beginning of the e-mail. I
>altered the code not to do the buffer switch when you switched from and
>right back to the same state, and it parsed out the "From " token.
>
>Question: In the output set for t.lexer is the word "From". The only
>place that the word "from" can possibly come from is the From at the
>beginning of the first line, the "From" header separator. If I read that
>correctly, the from word should *not* be a token.
>
>In other words, I think that from is only in the output for t.lexer
>because of a bug.
I took a look at this using gdb with a copy of bogolexer built without the
-O2 flag. The "^From " rule recognizes the header separator and returns it
as type FROM to get_token(). Prior to that, is_from() is called twice
which is inefficient but necessary. As far as I understand lex (and that's
less so than you do), this is all right and proper. Certainly bogofilter
expects that to happen.
David
More information about the bogofilter-dev
mailing list