What has become of buff and word and fgetsl?

Mon Feb 24 21:24:23 CET 2003

At 02:52 PM 2/24/03, Matthias Andree wrote:

>Hi,
>
>I just tried to debug an obscure problem with fgetsl, and I no longer
>understand the code.
>
>I have now cleaned up fgetsl.c, whoever expects fgetsl.c to prepend what
>was in the buffer prior to the call, will have to be fixed. "make check"
>doesn't reveal such code.
>
>I'll let someone else who has more insight into buff/word code clean up
>/that/ mess. AFAICT, the buff/word code is way too complex. The
>buff_shrink and buff_expand code looks bogus. We don't need all that
>complexity. All that buff needs to be is a compound of length and buffer
>pointer. The relevant functions are buff_new, buff_delete, and maybe the
>str(cpy|cat) or whatever else is actually used. We add abstraction layer
>over abstraction layer without documenting anything. The interaction of
>buff and word for example I don't understand.
>
>I can no longer maintain that buff/word stuff. :-(

Matthias,

The buff/word stuff isn't trivially simple or obvious.  Unfortunately.

Bogofilter has a need for adding new text to a partially filled buffer. 
That ability is used in get_decoded_line() and in 
process_html_comments().  The buff_shrink() and buff_expand() routines are 
used to change pointers to allow this to happen.  I think that it can be 
coded differently by storing additional information in the buff_t 
struct.  The "read" variable is a start towards that goal.  However it is 
not yet implemented.

Also there are times when bogofilter has a buff_t and calls a function that 
needs a word_t.  Rather than create a new word_t from the text and leng 
fields of the buff_t, I found it expedient to include a word_t within the 
buff_t.  From an object oriented point of view, the word_t is the super 
class and the buff_t is a subclass.

>Note that fgetsl wouldn't have to be aware of buff at all, a wrapper
>could take care of that, but I'm not separating the buff out now.

By providing a more uniform interface using buff_t and word_t lessens the 
need for wrappers.  That's a good thing (TM).

>My fix doesn't really speed things up though :-(

The fgetsl() fix forced the lexer to process the _whole_ file.  Without the 
fix, the lexer was processing approx 20K of 100K.  As the lexer's time for 
processing lots of characters doesn't seem to be linear, the fgets() fix 
caused the time to process the test file to increase more than five fold.