autodaemon

Matthias Andree matthias.andree at gmx.de
Fri Jan 31 04:41:33 CET 2003


Chris Wilkes <cwilkes-bf at ladro.com> writes:

> Perhaps this is better for the -dev list, but what about reading /
> writing from the DB4 files?  Would each BF process have access to the
> two .db files you've read into memory and are writting out new tokens
> to?

The files are accessed on disk, the DB library does some caching as does
the kernel. If you don't modify the data base, it's fast.

> Could you load up the .db files into a shared memory segment and then
> continue on as normal without modifying BF's code too much?

Not easily, unless Gyepi has an idea. :-)

> I would think a lot of BF's "slowness" (doesn't seem to slow to me!)
> is having to read in and modify the database files.  Is there any way
> to prove / disprove this?

Compile and link with -pg -ax and run gprof after bogofilter has
completed to get some profile data.

> I would think if you kept those files in memory that would save a lot
> on access times.  Perhaps have a process that flushed out the memory
> to disk every N writes or every M minutes.

There are kernel caches, so as mail flows in at a high rate, the
relevant data will tend to be cached, and particularly without the "-u"
mode, it will be pretty quick.

Another approach would be to use the transactional interface of
BerkeleyDB, but I don't know yet how good that scales.

> Course I'm just making all this up on the fly so maybe its a bad idea to
> do.  I also think writing the email into read-only shared memory and
> having multiple spam, virus, and other checkers running on it would save
> disk i/o resources.

That would be really important. OTOH, you can mount your shmfs/tmpfs/mfs
and put an intermediate queue into RAM if you make sure that you have a
copy on disk at every time (that might be an unfiltered/unscanned copy
though).

OTOH, the MTA itself will cause some synchronous operations, so I wonder
if bogofilter's asynchronous operation in modes except one of -n -N -s
-S -u really matters.

> Maybe there's portability issues with this between linux and BSD.

Shared memory maybe, and there's still Solaris, HP-UX, AIX, IRIX on
bigger machines with their own class of portability issues.

-- 
Matthias Andree




More information about the Bogofilter mailing list