autodaemon [was: mailbox classificataion]

David Relson relson at osagesoftware.com
Thu Jan 30 21:17:57 CET 2003


At 02:43 PM 1/30/03, Chris Wilkes wrote:

>Perhaps I'm missing something here, but isn't part of BF dependant on
>getting individual emails and making score counts based off of that?  BF
>would have to split up the emails and process them individually, and
>adding code to handle different mailbox types would be more added code.
>
>What's so difficult about going the "formail" route?  Or if you have
>Maildir formatted mailboxes:
>   for i in *; do bogofilter -whatever < $i; done

Chris,

Remember that bogofilter has two modes - classifying messages and 
registering (training).  Classifying expects to receive single messages, 
such as procmail feeds it.  Registering has code to handle mbox formatted 
files so that 1 run of bogofilter can process many messages.  Using formail 
to split a mbox into separate messages and feed them to bogofilter 1 at a 
time is much slower than giving bogofilter the whole mbox.

As bogofilter is right now, it can process a mailbox quite quickly.  The 
downside of that is that bogofilter needs special code to break the mailbox 
into pieces.  This code is complex and hard to work with.  At the moment it 
appears to work.

The reason that using formail is slow is that bogofilter must be run 
separately for each message.  This incurs costs of loading the program, 
opening the databases, etc, etc.  Caching in the operating system 
can/should/will lessen the penalty, but there are still costs.

What the autodaemon module offers is to have a bogofilter daemon running 
all the time, with a small program to pass the email to it and receive 
results from it.  This is a big win because it reduces the cost of starting 
bogofilter for each message.  It is also a win for busy systems that may 
multiple instances of bogofilter running.

David





More information about the Bogofilter mailing list