From [was: Performance issues....and ugly news.]

Mon Feb 24 03:19:25 CET 2003

At 08:53 PM 2/23/03, Nick Simicich wrote:

Nick,

If you haven't resolved anything by morning, I'll take a crack at your 
code.  Right now, I'm at the end of my day and not alert enough to tackle 
something new and different that I expect to be intricate and complicated.

>I think I am at least a few hours, with any luck, from getting this to 
>work.  I had a version that failed t.lexer, but I think that it failed 
>t.lexer because it switched out the first buffer and switched it right 
>back in and missed the "From " token at the beginning of the e-mail.  I 
>altered the code not to do the buffer switch when you switched from and 
>right back to the same state, and it parsed out the "From " token.
>
>Question:  In the output set for t.lexer is the word "From".  The only 
>place that the word "from" can possibly come from is the From at the 
>beginning of the first line, the "From" header separator.  If I read that 
>correctly, the from word should *not* be a token.
>
>In other words, I think that from is only in the output for t.lexer 
>because of a bug.

I took a look at this using gdb with a copy of bogolexer built without the 
-O2 flag.  The "^From " rule recognizes the header separator and returns it 
as type FROM to get_token().  Prior to that, is_from() is called twice 
which is inefficient but necessary.  As far as I understand lex (and that's 
less so than you do), this is all right and proper.  Certainly bogofilter 
expects that to happen.

David