autodaemon [was: mailbox classificataion]
David Relson
relson at osagesoftware.com
Thu Jan 30 21:17:57 CET 2003
At 02:43 PM 1/30/03, Chris Wilkes wrote:
>Perhaps I'm missing something here, but isn't part of BF dependant on
>getting individual emails and making score counts based off of that? BF
>would have to split up the emails and process them individually, and
>adding code to handle different mailbox types would be more added code.
>
>What's so difficult about going the "formail" route? Or if you have
>Maildir formatted mailboxes:
> for i in *; do bogofilter -whatever < $i; done
Chris,
Remember that bogofilter has two modes - classifying messages and
registering (training). Classifying expects to receive single messages,
such as procmail feeds it. Registering has code to handle mbox formatted
files so that 1 run of bogofilter can process many messages. Using formail
to split a mbox into separate messages and feed them to bogofilter 1 at a
time is much slower than giving bogofilter the whole mbox.
As bogofilter is right now, it can process a mailbox quite quickly. The
downside of that is that bogofilter needs special code to break the mailbox
into pieces. This code is complex and hard to work with. At the moment it
appears to work.
The reason that using formail is slow is that bogofilter must be run
separately for each message. This incurs costs of loading the program,
opening the databases, etc, etc. Caching in the operating system
can/should/will lessen the penalty, but there are still costs.
What the autodaemon module offers is to have a bogofilter daemon running
all the time, with a small program to pass the email to it and receive
results from it. This is a big win because it reduces the cost of starting
bogofilter for each message. It is also a win for busy systems that may
multiple instances of bogofilter running.
David
More information about the Bogofilter
mailing list