What has become of buff and word and fgetsl?

Nick Simicich njs at scifi.squawk.com
Thu Feb 27 19:10:40 CET 2003


At 11:45 PM 2003-02-26 -0500, David Relson wrote:

>Flex seems to use 8k for its basic buffer size.  When reading Greg's file, 
>flex first gets a qp line (76 x's), then tries to match a rule.  The rule 
>indicates that a longer token can be matched.  Flex provides a partial 
>buffer (8192-76) for the second request.  Another 76 char line is 
>read.  Match token.  Need more data. loop till buffer is full.
>
>That's as far as I traced it when I encountered the fgetsl() problem a 
>couple of days ago.  I expect that what flex does is expand the buffer, 
>read til full, expand, ...   I presume this continues until enough of the 
>file is read in to match the pattern.  For 3.txt that amount is 100K and 
>for 4.txt it's 600k.

That is exactly what it does, and that is also why those patterns I sent 
you that work is suspended on work so much faster - there is also a huge 
amount of stepping back through the state machine to get back to where you 
were in the token when you expand it.

>>I still wonder if these "buffer offsetting" technique is the right thing
>>to do. It harms efficience by calling OS syscall overhead upon
>>ourselves. I'd agree with that code in the HTML comment killer (after
>>all, things are getting shorter), but the scheme for READING from a file
>>is heading for the brick wall.
>
>Yes indeed.  There's a brick wall up ahead.

You have to consider not moving things as much if you want efficiency...
>>I mean, what happens when the buffer is ultimately too small and the
>>read request cannot be satisfied (e. g. you have four bytes left, but
>>you must fit "weather\n")? When is the buffer drained and the ->read
>>pointer reset? All this is a mystery to me currently. Is the -> read
>>stuff necessary?

Flex will enlarge its buffer a small bit at a time until it gets a "no 
memory" from malloc.

--
SPAM: Trademark for spiced, chopped ham manufactured by Hormel.
spam: Unsolicited, Bulk E-mail, where e-mail can be interpreted generally 
to mean electronic messages designed to be read by an individual, and it 
can include Usenet, SMS, AIM, etc.  But if it is not all three of 
Unsolicited, Bulk, and E-mail, it simply is not spam. Misusing the term 
plays into the hands of the spammers, since it causes confusion, and 
spammers thrive on  confusion. Spam is not speech, it is an action, like 
theft, or vandalism. If you were not confused, would you patronize a spammer?
Nick Simicich - njs at scifi.squawk.com - http://scifi.squawk.com/njs.html
Stop by and light up the world!



More information about the bogofilter-dev mailing list