What has become of buff and word and fgetsl?
Nick Simicich
njs at scifi.squawk.com
Thu Feb 27 19:10:40 CET 2003
At 11:45 PM 2003-02-26 -0500, David Relson wrote:
>Flex seems to use 8k for its basic buffer size. When reading Greg's file,
>flex first gets a qp line (76 x's), then tries to match a rule. The rule
>indicates that a longer token can be matched. Flex provides a partial
>buffer (8192-76) for the second request. Another 76 char line is
>read. Match token. Need more data. loop till buffer is full.
>
>That's as far as I traced it when I encountered the fgetsl() problem a
>couple of days ago. I expect that what flex does is expand the buffer,
>read til full, expand, ... I presume this continues until enough of the
>file is read in to match the pattern. For 3.txt that amount is 100K and
>for 4.txt it's 600k.
That is exactly what it does, and that is also why those patterns I sent
you that work is suspended on work so much faster - there is also a huge
amount of stepping back through the state machine to get back to where you
were in the token when you expand it.
>>I still wonder if these "buffer offsetting" technique is the right thing
>>to do. It harms efficience by calling OS syscall overhead upon
>>ourselves. I'd agree with that code in the HTML comment killer (after
>>all, things are getting shorter), but the scheme for READING from a file
>>is heading for the brick wall.
>
>Yes indeed. There's a brick wall up ahead.
You have to consider not moving things as much if you want efficiency...
>>I mean, what happens when the buffer is ultimately too small and the
>>read request cannot be satisfied (e. g. you have four bytes left, but
>>you must fit "weather\n")? When is the buffer drained and the ->read
>>pointer reset? All this is a mystery to me currently. Is the -> read
>>stuff necessary?
Flex will enlarge its buffer a small bit at a time until it gets a "no
memory" from malloc.
--
SPAM: Trademark for spiced, chopped ham manufactured by Hormel.
spam: Unsolicited, Bulk E-mail, where e-mail can be interpreted generally
to mean electronic messages designed to be read by an individual, and it
can include Usenet, SMS, AIM, etc. But if it is not all three of
Unsolicited, Bulk, and E-mail, it simply is not spam. Misusing the term
plays into the hands of the spammers, since it causes confusion, and
spammers thrive on confusion. Spam is not speech, it is an action, like
theft, or vandalism. If you were not confused, would you patronize a spammer?
Nick Simicich - njs at scifi.squawk.com - http://scifi.squawk.com/njs.html
Stop by and light up the world!
More information about the bogofilter-dev
mailing list