simplicity vs safety with complexity

Andras Salamon andras at dns.net
Tue Jan 25 06:30:09 CET 2005


On Mon, Jan 24, 2005 at 07:49:00PM -0500, David Relson wrote:
> 1) a wordlist that's simple, easy to backup, but vulnerable to software
> and hardware crashes; or
> 
> 2) a wordlist that offers crash protection but is complex to maintain,
> backup, ... 

Definitely 1.  It's the reason I still have not upgraded from 0.92.8
(and have no current plans to, even with all the fixes in 0.93).

> Over time it became apparent that there were some issues with Berkeley
> DB of which two were significant.  With the locking available, any
> number of programs could read the database, but writing it required a
> single program to get exclusive read/write access.

That's fine for me.  I run a read-only database on our mail server to
prefilter user mail, and maintain a write-only database on my desktop
which is only written by a single bogofilter process at a time.
Occasionally I synch this with the server copy for the benefit of
everyone.

Even if there is corruption in the database, I'm confident enough in
the weights to not even look at the definite-spam file anymore.  I do
scan the unsures and am seeing about 1% ham in there.  False negatives
in the ham folder are sub-1%.

> Additionally, if a
> program had the database open (for writing) and there was a program or
> system crash, the database could become corrupt.

Not a problem for me so far.

> Third, as messages are added to the database, the logfiles grow rapidly.
>  This causes problems for some users.  Dealing with the logfiles
> requires learning additional Berkeley DB commands such as db_checkpoint
> and db_archive.

And even with all this extra complexity, these still don't seem to
help with merging different users' changes into a central database.
Please correct me if I am wrong!

> One option is having transactions disabled as the
> default and, as database problems are encountered, have enabling
> transactions as a solution.

This sounds worthwhile.

The main feature I would love to see would be a way of allowing multiple
users to create deltas to be merged together into a central database.
I don't really care about duplicates being included multiple times,
which would mean a delta with just tokens and +/- counts.  Perhaps this
is a post-1.0 issue.

-- Andras Salamon                   andras at dns.net



More information about the bogofilter-dev mailing list