program size

David Relson relson at osagesoftware.com
Fri Aug 1 20:03:48 CEST 2003


At 01:45 PM 8/1/03, Matthias Andree wrote:
>David Relson <relson at osagesoftware.com> writes:
>
> > It has been suggested that an alternative to flex might exist.  What do
> > y'all know of this subject?  Anybody interested in taking on the task of
> > testing alternatives?
>
>Offhand, I know about ANTLR/DLG (recent versions only emit C++ and Java
>though), and I seem to recall something like e2c or something that also
>builds parsers, and we might do individual parsers manually.
>
>Is there some profiling option in terms of code complexity in flex?

What caused me to suspect flex was changes to the QP pattern caused large 
changes in generated code size.  To see it yourself, check the current 
sizes of lexer_v3.l and lexer_v3.c.  Then change:

QP [!->@-~]+
to:
QP [^[:blank:]\?]

the executable size difference is 200K (or so).

At the moment, I'm testing some flex options, i.e. "-Cf", "-CF", etc.  They 
don't look too promising as some of them cause gcc to segfault.

>OTOH, can we make some "functional" (or equivalent) approach for our
>lexers (call it daisy chaining if you like)? If we have one "parser"
>call into another, we might split "parsers" across logical levels. A
>low-level one could parse the message into MIME parts and swallow the
>attachments, a higher level one might do what the old lexer_text or
>lexer_html did. These lexer_{text,html} would call into the mime parser
>when they need more input. I hope the low-level lexer doesn't need to
>buffer much more than a line of input.
>
>The old multi-lexer approach suffered because we switched between lexers
>rather than cascading them, which led to troubles with mail boundary
>detection (we had to have mbox "^From " recognition code in all the
>lexers).

It might work.  I never found the key to partitioning the lexer into 
sections or levels.  Perhaps you'll do better :-)





More information about the Bogofilter mailing list