The joy of buffer switching....

Nick Simicich njs at scifi.squawk.com
Mon Feb 24 16:25:29 CET 2003


I have spent several more hours trying to work on moving buffers from 
flexer to flexer.  I am convinced that the approach will not work.  It is 
supposed to, but it does not.

The specific issue is this:

(gdb) p *yy_current_buffer
$29 = {yy_input_file = 0x40231ce0,
   yy_ch_buf = 0x80e42f0 "\nFrom glamb at dynagen.on.ca  Fri Nov  1 06:20:26 
2002\n56 at wall.org>\n",
   yy_buf_pos = 0x80e42f1 "From glamb at dynagen.on.ca  Fri Nov  1 06:20:26 
2002\n56 at wall.org>\n", yy_buf_size = 16384, yy_n_chars = 1, 
yy_is_our_buffer = 1,
   yy_is_interactive = 1, yy_at_bol = 1, yy_fill_buffer = 1,
   yy_buffer_status = 1}
(gdb) p yy_n_chars
$30 = 2
(gdb)
We are in the call where the buffer is being extracted. We have moved this 
buffer from head-to-text, and that worked. We are moving the buffer 
text-to-head. This is the first time we are extracting a buffer from the 
plain text flexer in yy_switch_to_buffer(new_buffer).  The code that is 
about to be executed is going to completely screw up the buffer - it will 
overlay the 'o' in from with a null.  It has parsed the "From " token out, 
and we should be saving the rest of the line for the next state.

I am through working on this for the day.  If someone else can come up with 
a workable buffer swapping scheme, I will certainly listen, or if someone 
can tell me what I am doing - I am still essentially running the patched I 
posted yesterday, except I turned off optimization so that I can run gdb 
more easily and so that the trace commands work more predictably.

THIS IS SUPPOSED TO WORK, as far as I can tell on the man page.  You are 
*SUPPOSED* to be able to stash a partially processed buffer, then go off 
and do something else with the lexer, then return to the buffer, with the 
buffer holding the state for where you are in the input stream.  The buffer 
swapping is the essence of processing include files.  You hide the input 
buffer, in your own data structure, switch buffers, and handle the input 
associated with the new buffer.

Doing the yy_switch_to_buffer() is supposed to take the variables that are 
up in the air inside the lexer and stash them inside the buffer's state 
variables. But it just is not working.  I spent a while traceing this, 
watching it go wrong, until I realized that the yy_switch_to_buffer was 
hosing the buffer.

I have two approaches.  One is that I should be calling yy_switch_to_buffer 
from within the rule rather than from outside.  I will try adding the code 
to the processing of "From".

If that fails, then I will work on forcing in EOFs and moving detection of 
From, mime boundaries and header ends to the code outside of the lexer.

Feeding the lexer artificial EOFs at the end of every section is probably 
clean enough to work unconditionally.

--
SPAM: Trademark for spiced, chopped ham manufactured by Hormel.
spam: Unsolicited, Bulk E-mail, where e-mail can be interpreted generally 
to mean electronic messages designed to be read by an individual, and it 
can include Usenet, SMS, AIM, etc.  But if it is not all three of 
Unsolicited, Bulk, and E-mail, it simply is not spam. Misusing the term 
plays into the hands of the spammers, since it causes confusion, and 
spammers thrive on  confusion. Spam is not speech, it is an action, like 
theft, or vandalism. If you were not confused, would you patronize a spammer?
Nick Simicich - njs at scifi.squawk.com - http://scifi.squawk.com/njs.html
Stop by and light up the world!



More information about the bogofilter-dev mailing list