simplicity vs safety with complexity

Pavel Kankovsky peak at argo.troja.mff.cuni.cz
Thu Jan 27 01:01:50 CET 2005


On 25 Jan 2005, Tom Anderson wrote:

> 3) a wordlist that's simple, easy to backup, and offers crash protection

You can achieve that goal rather easily. When you need to update the db,
you make a copy, change that copy, do fsync() to make sure the data has 
been written to the disk (*), and replace the db with the updated copy 
using rename(). (**)

Unfortunately, this would be pretty slow for small updates. On the other 
hand, it can be an acceptable modus operandi for large batch updates 
(and many people who do batch training rather than online training are 
already doing it).

(*) Of course, this assumes the OS or HW is not cheating (I shall not name
any cheap harddisks with write-back cache enabled by default in order to
make their benchmark results better...). But any mechanism would lose if
either of them is cheating ergo this is an acceptable assumption.

(**) Of course, this assumes the filesystem is able to recover to a
consistent and reasonable state after the crash. Again, any db store
sitting on the top of a filesystem needs the fs to behave reasonably
ergo this is an acceptable assumption too.

> Let's work on that.  For now, I'm still using 0.92.8, and it works
> great.  I don't have the time or energy right now to battle with the
> problems others are having with transactions.  Could we perhaps try a
> different database?  All of my linux systems have MySQL installed... how
> about that?

Well, none of my Linux systems has MySQL installed and running, and, to be
honest, I'd hate to have to set up and run it on every machine just to be
able to use Bogofilter. (On the other hand, I would not mind if it was an
optional feature.)

I am afraid there is no easy solution. If you want a safe db then you have
to pick one of the two ways:

1. simple non-transactional db store with "file-level" transactions
   and batch updates (or very slow online updates),

2. online updates and complex transaction db store (using either an
   embedded db engine like Berkeley DB or SQLite or a client-server db 
   engine like MySQL or PostgreSQL...or Oracle (***)).

(***) Just kidding. :)


--Pavel Kankovsky aka Peak  [ Boycott Microsoft--http://www.vcnet.com/bms ]
"Resistance is futile. Open your source code and prepare for assimilation."




More information about the Bogofilter mailing list