front end
Matthias Andree
matthias.andree at gmx.de
Mon Aug 11 15:40:16 CEST 2003
David Relson <relson at osagesoftware.com> writes:
> Right now, the solution seems to be a front end that breaks the input
> into messages and then passes each message to parsing, registration,
> classification, etc. Stated differently, bogofilter needs a formail
> type capability.
I agree. We'll need to design some interfaces, again.
We'll end up with a collection of modules (also layers at the same
time), I believe:
READER - clear: breaks apart some message store (mbox, Maildir)
unclear: iterate over messages?
implementation:
MBOX_SPLITTER - splits on ^$^From .
MAILDIR_READER - iterates over Maildir
MIME_DECODER - understand MIME structure and decode encoded parts,
ignore non-text parts
It may also be used to separate header from body if we find a way to
pass this on to the lexer. I'll read through Postfix's MIME interpreter
which is a single-pass, non-caching architecture (and can do qp encoding
on the fly if requested) to shoplift some design ideas.
LEXER - what we have now, with ^From stuff removed.
We appear to have LEXER and MIME_DECODER in one which isn't exact and
somewhat dangerous. MIME_DECODER results can confuse the structural
analysis that LEXER currently does, as has recently been found out the
hard way. The protocol between LEXER and MIME_DECODER can be different
or the same as the protocol between READER and MIME_DECODER, I don't
have a strong preference here.
The actual implementation of the READER/MIME_DECODER interfaces allows
some variants:
1. the "READER" is also a driver and calls into bogofilter. This would
lend itself nicely to a library model that lets applications query a
bogofilter library "look at 3874 bytes from 0x1234567c and return
spamicity".
2. the MIME_DECODER calls into the READER module and tells it "get me
the next line", but we'd need a protocol for "end of message" and
"end of all messages". Maybe EOF and EOF EOF do the job, but it's
ugly.
I'd prefer #1.
Opinions?
--
Matthias Andree
More information about the Bogofilter
mailing list