Bogofilter and zmailer

Matthias Andree matthias.andree at gmx.de
Fri May 23 01:02:10 CEST 2003


Marek Kowal <marek.kowal at portal.onet.pl> writes:

> So how about just the option for the configuration file? I would provide
> appropriate patches. The point is, bogofilter undergoes very rapid changes,
> so I would have to adopt my patch every one-two weeks, unless the change
> goes to the main repository. That's why I specifically asked for the bulk
> mode to be put to the production sources few weeks ago - for exactly the
> same reason.

Well, we seem not to be able to avoid your performance (ambiguity intended) :-)

David, how about if we offer a hook to plug filters before the lexer?

Early in the stream, like so:

unbatcher -> filter -> bogolexer -> algorithm -> output

We have somewhat modularized bogolexer (lexer_v3 vs. the other stuff)
and algorithm (g/r/rf), and the output mode is somewhat configurable
(it's tied to the input currently).

The filter could for example ignore (or reformat) the ZMailer-specific
format so the lexer eithre doesn't see it or makes sense of it, and the
unbatcher would for example break a mbox file or iterate over a
maildir. Thinking about how all that stuff is currently implemented, we
already have a good deal of functional programming buried in our code.

I wonder if it's worthwhile exposing the individual stages in the API,
so other pieces of software could use it. We already have one such
piece, bogolexer. bogoutil has also heading towards that direction, and
if for practical reasons of the "code reuse" kind currently.

This would keep maintainability at a bearable level without making the
lexer too fat.

-- 
Matthias Andree




More information about the bogofilter-dev mailing list