uuencoded attachments produce woe

Sun Dec 8 15:13:20 CET 2002

David Relson <relson at osagesoftware.com> writes:

> I tried pattern #1 in lexer.l and couldn't get it to work.  As I'm a
> flex novice, I tried a few other variants, but never got the lexer to
> skip over uuencode.
>
> My ideas on separating get_token() from lexer.l and adding
> content-type.c are part of preparing bogofilter to handle the
> Content-Type-Encoding directive :-)

I'm wondering if it would be hard to make lexer.l parsing stateful; no
need to split that stuff.

My idea is: parse MIME-Header (REQUIRED!), Content-Type (multipart or
message are the interesting primary types here),
Content-Transfer-Encoding and if multipart, toss the boundary line on a
stack, and the boundary parser will look at the stack to figure if it's
a valid boundary line and how many levels to pop from the stack -- we'd
only need to be able to push one token back to the parser and block a
particular rule if we want it to be done in a simple way and how much
performance and size impact that has (if we need REJECT rules, for
example).

If that's not possible, we'll have to wrap around get_token() (and might
consider yacc/bison for that...).

But that's not going to bring us uudecode stuff.

I have a tip though: as long as you're using flex, you can add "debug"
without quotes to the %option line in lexer.l and recompile and play
with bogolexer, it'll print which rule (line number) matches and how
much input it eats. If you have an open editor window on the screen and
the output in a terminal window on the same screen, it's reasonable
enough to debug.

-- 
Matthias Andree