bogofilter setup for 50000+ users

Matthias Andree matthias.andree at gmx.de
Sat Jul 21 16:40:53 CEST 2007


Bryan Loniewski schrieb:

>> A couple of questions:
>>
>> What OS will squirrelmail be running on?  What database will bogofilter
>> be using?
> 
> We run squirrelmail on Solaris 9 boxes. We planned on using Berkeley
> DB (4.2.52).

Well, NFS implementations from recent Solaris releases are apparently
quite reliable, but I haven't ever tried to use those for serious stuff
such as fcntl()-based record locking across NFS. Sun have some
reputation for their NFS locking, but I'm not in the position to make
any sustainable comments if it's sufficient for either Berkeley DB or
SQLite3 and even less for QDBM.

SleepyCat used to recommend against using NFS or AFS or other remote
file systems for storing databases, UNLESS they support FULL standard
POSIX filesystem semantics, including mutexes, memory mapping files --
and still, the database cannot be shared between different computers,
there may be only at most one computer accessing the database at a time.
(Check docs/ref/env/remote.html in your Berkeley DB directory,
/usr/local/BerkeleyDB.4.2 on my computer.)

> So would you recommend we use sqlite3? Do we lose/gain anything by
> using sqlite3 vs. Berkeley?

I think neither is recommended for NFS use, so it boils down to a matter
of taste or looking at how big your iron is :-)

The vendor's statement for SQLite3 is essentially the same as
SleepyCat's, namely "avoid [NFS] if multiple processes might try to
access the file at the same time" <http://sqlite.org/faq.html#q5>

sqlite3 uses fewer files (just one database file and one additional
-journal file while there are pending transactions) and is IMVHO easier
to maintain as you can do fewer things wrong than with BerkeleyDB - but
that only matters to inexperienced users (end users) rather than for
large-scale system administrators. BerkeleyDB uses the database files,
__db* environment files and log.* files in transactional mode (and I'd
advise against using traditional = non-transactional mode).

On the other hand, sqlite3 is a bit slower than BerkeleyDB. It's less
than an order of magnitude for most operations, but definitely noticable
on a crowded server. I haven't run tests recently, but a factor of 2 or
3 would not surprise me, an SQL database (although our adaptor is
optimized to avoid most bottlenecks, with the help of D. Richard Hipp -
thanks to him) is an entirely different beast when compared to Berkeley
DB that we use as a simple B-something-tree (key,value) store.
Essentially, bogofilter using SQLite3 is mapping the same data into a
two-column table with two BLOBs - not exactly how you'd usually use SQL
database, but way easier to implement.

HTH - feel free to ask further questions.

-- 
Matthias Andree



More information about the Bogofilter mailing list