What has become of buff and word and fgetsl?

David Relson relson at osagesoftware.com
Tue Feb 25 02:26:57 CET 2003


At 08:17 PM 2/24/03, Matthias Andree wrote:

>David Relson <relson at osagesoftware.com> writes:
>
> > Bogofilter has a need for adding new text to a partially filled
> > buffer. That ability is used in get_decoded_line() and in
> > process_html_comments().  The buff_shrink() and buff_expand() routines
> > are used to change pointers to allow this to happen.
>
>But these come in pairs.

True.  The pairing is to change the read position for fgetsl() and then 
return to normal so that the accumulated text can be processed.  Possibly 
setting a read position is all that is needed (see below "read 
variable").  However I haven't tried it yet.

> > I think that it can be coded differently by storing additional
> > information in the buff_t struct.  The "read" variable is a start
> > towards that goal.  However it is not yet implemented.
>
>It's a relief to read that.

Actually, I used the invisible font so you wouldn't be able to find 
references to the variable. (Ducks and runs...)


> > Also there are times when bogofilter has a buff_t and calls a function
> > that needs a word_t.  Rather than create a new word_t from the text and
> > leng fields of the buff_t, I found it expedient to include a word_t
> > within the buff_t.  From an object oriented point of view, the word_t is
> > the super class and the buff_t is a subclass.
>
>That's fine.
>
> >>Note that fgetsl wouldn't have to be aware of buff at all, a wrapper
> >>could take care of that, but I'm not separating the buff out now.
> >
> > By providing a more uniform interface using buff_t and word_t lessens
> > the need for wrappers.  That's a good thing (TM).
>
>As long as it's understandable, yes.
>
> >>My fix doesn't really speed things up though :-(
> >
> > The fgetsl() fix forced the lexer to process the _whole_ file.  Without
> > the fix, the lexer was processing approx 20K of 100K.  As the lexer's
> > time for processing lots of characters doesn't seem to be linear, the
> > fgets() fix caused the time to process the test file to increase more
> > than five fold.
>
>Yup. And I wonder if memory allocation is the issue here or backing up
>in the scanner.

Profiling clearly shows that 99.9% of the time is spent inside of flex 
code.  I doubt it's memory allocation.  Backing up is more likely the case.





More information about the bogofilter-dev mailing list