Lexer restructure.

Tue Jan 28 08:39:25 CET 2003

Matt Armstrong <matt at lickey.com> writes:

> David Relson <relson at osagesoftware.com> writes:
>
>> On the way into Dearborn (50 km) this morning, I was thinking about
>> your "lexer restructure" message.  About the time I started
>> thinking "multiple levels of mime encoding", I had the very same
>> 0.11 thought you've had.
>
> I've implemented a MIME aware mail processing library in Ruby where
> each layer of the parsing process is represented by a different
> parser class.  They are just:
>
>     mailbox parser
>     message parser

I totally left out:

    multipart parser

[...]

> The message parser just takes an input stream and parses it like a
> message.  It parses the header, then depending on the MIME fields in
> the header parses the body as a single part or a multipart.  The
> multipart parser actually creates NEW message parser objects for each
> sub-part and in this way parses them recursively.

What I meant to say was that if the message parser finds that its body
is a multipart/*, then it creates a multipart parser and feeds it its
own input stream.  It is then used similar to a mailbox parser: create
a message parser with the multipart parser as its input source, call
"next", repeat if not at the end of the multipart.

[...]

>     message parser - for one of the parts within the multipart
>     message parser - for top level multipart
>     mbox parser - to process From: lines

You don't get the above.  If you are parsing the text/plain body part
of a multipart out of a mbox, you have these nested parsers:

    message parser       - checking for RFC2822 message structure
    multipart parser     - checking for MIME "--" boundary lines
    message parser       - checking for RFC2822 message structure
    mbox parser          - checking for "From " lines

Again, for bogofilter this is overkill but the design mirrors the
nature of the problem closely.