Converting old wordlist.db to Berkeley format?

Geoff capsthorne at yahoo.co.uk
Mon Sep 5 11:58:34 CEST 2016


On Mon, 5 Sep 2016 01:19:44 +0200
Matthias Andree <matthias.andree at gmx.de> wrote:

<snip>
> 
> This pretty much looks like multiple non-matching instances of Berkeley
> DB and/or bogofilter on the computer (try "which -a bogofilter"), and in
> particular bogofilter not using the same Berkeley DB version - as though
> an old and a new version were accessing the same database. bogofilter -V
> or bogoutil -V will report its own version, and the Berkeley DB version
> they are using. Be sure to use the same Berkeley DB version (library,
> utilities) consistently when you use the db_* tools and bogofilter.
> > db_verify was a bit of a nuisance in that it demanded more mutex space than
> > was allocated by the system, and sometimes would not run at all for that
> > reason. Google shows that this is an old issue and (apparently) unrelated
> > to the problem I was trying to solve.
> I can reproduce one part of the problem, and explain it.
> 
> The bogoutil -l created log files, the wordlist.db.new file, and "the
> environment" in the __db.00* (001 and counting) files.
> In this environment, Berkeley DB stores the filename, "wordlist.db.new".
> 
> Now you've renamed the file to "wordlist.db".
> But the __db.* files still point to the "wordlist.db.new" file - which
> is no longer there.  That causes db_verify to fail.
> 
> There are several remedies, bogoutil --db-recover, bogoutil
> --db-remove-environment, even bogoutil --db-verify - after running
> either, then db_verify ~/.bogofilter/wordlist.db also succeeds.
> db_verify fails between the rename and using bogofilter or bogoutil or
> running db_recover. In the end they all ditch the stale __db.* files,
> which resolves the situation because the database itself is not corrupted.
> 
> BTW, you can get rid of log files that are no longer needed by:
> 
>     bogoutil --db-prune=$HOME/.bogofilter
> 
> Remember: If you copy databases that use transactions, you need to copy
> all the log.* files.
> 
> Note: /not/ using transactions - which is your current setup - means
> that *ANY* crash of computer or bogofilter software can corrupt the
> database. The transaction stuff is meant to make it crashproof.

Thank you Matthias

The computer on which bogofilter is now running is new - I assembled it a week
ago and installed Arch with its bogofilter package 1.2.4-2. Berkeley (Arch
5.3.28-3) was installed as a dependency of that and other packages.  My system
should (obviously), be pristine if the Arch configuration is correct.  When I
compile my own packages I put them in my own "/mysys" tree - (compiling with
prefix=/mysis etc), and run the binary from there.  That is something I have
been doing ever since I came to linux, so that (so far as possible), my own
experiments / idiocy don't screw up whatever distro I am using.  This computer
never had bogofilter in that tree until I compiled my own version two days ago,
and I did that only after removing the Arch package.

So far as the integrity of my wordlist.db is concerned, I understand the risk
from a crash, but whenever I compact wordlist.db (every month or so), I
first copy the existing wordlist.db to a safe place.  I would not lose
much if I had to fall back on that.

It would be tedious, but if need be I could also retrain.  As soon as I
understood (back in 2003), that training could be a slow process, I began to
archive my spam in case I needed to start again. (I already archived everthing
else for purposes of my profession).  I have about 93K spam mails, and rather
more ham, archived.

Geoff


More information about the bogofilter mailing list