front end

Matthias Andree matthias.andree at gmx.de
Mon Aug 11 15:40:16 CEST 2003


David Relson <relson at osagesoftware.com> writes:

> Right now, the solution seems to be a front end that breaks the input
> into messages and then passes each message to parsing, registration,
> classification, etc.  Stated differently, bogofilter needs a formail
> type capability.

I agree. We'll need to design some interfaces, again.

We'll end up with a collection of modules (also layers at the same
time), I believe:

READER - clear: breaks apart some message store (mbox, Maildir)
         unclear: iterate over messages?
  implementation:
  MBOX_SPLITTER - splits on ^$^From .
  MAILDIR_READER - iterates over Maildir

MIME_DECODER - understand MIME structure and decode encoded parts,
               ignore non-text parts

It may also be used to separate header from body if we find a way to
pass this on to the lexer. I'll read through Postfix's MIME interpreter
which is a single-pass, non-caching architecture (and can do qp encoding
on the fly if requested) to shoplift some design ideas.

LEXER - what we have now, with ^From stuff removed.

We appear to have LEXER and MIME_DECODER in one which isn't exact and
somewhat dangerous. MIME_DECODER results can confuse the structural
analysis that LEXER currently does, as has recently been found out the
hard way. The protocol between LEXER and MIME_DECODER can be different
or the same as the protocol between READER and MIME_DECODER, I don't
have a strong preference here.

The actual implementation of the READER/MIME_DECODER interfaces allows
some variants:

1. the "READER" is also a driver and calls into bogofilter. This would
   lend itself nicely to a library model that lets applications query a
   bogofilter library "look at 3874 bytes from 0x1234567c and return
   spamicity".

2. the MIME_DECODER calls into the READER module and tells it "get me
   the next line", but we'd need a protocol for "end of message" and
   "end of all messages". Maybe EOF and EOF EOF do the job, but it's
   ugly.

I'd prefer #1.

Opinions?

-- 
Matthias Andree




More information about the Bogofilter mailing list