What has become of buff and word and fgetsl?

Matthias Andree matthias.andree at gmx.de
Thu Feb 27 05:28:00 CET 2003


David Relson <relson at osagesoftware.com> writes:

> Right.  It's especially noticeable with Greg's n*100k 'x' files, which
> bogofilter used to only partially process.  Stepping through the code, I
> could see flex ask for 8192 bytes, get 76, ask for 8192-76, get 76 more,
> ask for 8192-76-76, etc.

Why is that? I mean, where is flex heading to with 8k? Is that "batch
aka. 'swallow the feast in one go'" mode at work?

> Changing address and length is a classic method for adding more data
> to an existing buffer.  Bogofilter's code for killing html comments
> uses the same technique to process multiline commends.

I still wonder if these "buffer offsetting" technique is the right thing
to do. It harms efficience by calling OS syscall overhead upon
ourselves. I'd agree with that code in the HTML comment killer (after
all, things are getting shorter), but the scheme for READING from a file
is heading for the brick wall.

I mean, what happens when the buffer is ultimately too small and the
read request cannot be satisfied (e. g. you have four bytes left, but
you must fit "weather\n")? When is the buffer drained and the ->read
pointer reset? All this is a mystery to me currently. Is the -> read
stuff necessary?

I know it's error prone though: when you merged the buff_fgetsl.c
function, you offset the start pointer, but you didn't reduce the
maximum buffer size. Nice heap smasher. Is this ->read REALLY needed?

We'll need two people who've never seen bogofilter to audit the code
before we release "stable 1.0" and

I've written a glibc-compatible getline() emulation function with less
strict license for leafnode that has a static buffer which is extended
when needed -- we could use that to read into buff -- however, anything
that relies on pointers _into_ buff, IOW, anything that relies on
buff.text being invariant, will crash.

-- 
Matthias Andree




More information about the bogofilter-dev mailing list