Converting old wordlist.db to Berkeley format?
Geoff
capsthorne at yahoo.co.uk
Mon Sep 5 11:58:34 CEST 2016
On Mon, 5 Sep 2016 01:19:44 +0200
Matthias Andree <matthias.andree at gmx.de> wrote:
<snip>
>
> This pretty much looks like multiple non-matching instances of Berkeley
> DB and/or bogofilter on the computer (try "which -a bogofilter"), and in
> particular bogofilter not using the same Berkeley DB version - as though
> an old and a new version were accessing the same database. bogofilter -V
> or bogoutil -V will report its own version, and the Berkeley DB version
> they are using. Be sure to use the same Berkeley DB version (library,
> utilities) consistently when you use the db_* tools and bogofilter.
> > db_verify was a bit of a nuisance in that it demanded more mutex space than
> > was allocated by the system, and sometimes would not run at all for that
> > reason. Google shows that this is an old issue and (apparently) unrelated
> > to the problem I was trying to solve.
> I can reproduce one part of the problem, and explain it.
>
> The bogoutil -l created log files, the wordlist.db.new file, and "the
> environment" in the __db.00* (001 and counting) files.
> In this environment, Berkeley DB stores the filename, "wordlist.db.new".
>
> Now you've renamed the file to "wordlist.db".
> But the __db.* files still point to the "wordlist.db.new" file - which
> is no longer there. That causes db_verify to fail.
>
> There are several remedies, bogoutil --db-recover, bogoutil
> --db-remove-environment, even bogoutil --db-verify - after running
> either, then db_verify ~/.bogofilter/wordlist.db also succeeds.
> db_verify fails between the rename and using bogofilter or bogoutil or
> running db_recover. In the end they all ditch the stale __db.* files,
> which resolves the situation because the database itself is not corrupted.
>
> BTW, you can get rid of log files that are no longer needed by:
>
> bogoutil --db-prune=$HOME/.bogofilter
>
> Remember: If you copy databases that use transactions, you need to copy
> all the log.* files.
>
> Note: /not/ using transactions - which is your current setup - means
> that *ANY* crash of computer or bogofilter software can corrupt the
> database. The transaction stuff is meant to make it crashproof.
Thank you Matthias
The computer on which bogofilter is now running is new - I assembled it a week
ago and installed Arch with its bogofilter package 1.2.4-2. Berkeley (Arch
5.3.28-3) was installed as a dependency of that and other packages. My system
should (obviously), be pristine if the Arch configuration is correct. When I
compile my own packages I put them in my own "/mysys" tree - (compiling with
prefix=/mysis etc), and run the binary from there. That is something I have
been doing ever since I came to linux, so that (so far as possible), my own
experiments / idiocy don't screw up whatever distro I am using. This computer
never had bogofilter in that tree until I compiled my own version two days ago,
and I did that only after removing the Arch package.
So far as the integrity of my wordlist.db is concerned, I understand the risk
from a crash, but whenever I compact wordlist.db (every month or so), I
first copy the existing wordlist.db to a safe place. I would not lose
much if I had to fall back on that.
It would be tedious, but if need be I could also retrain. As soon as I
understood (back in 2003), that training could be a slow process, I began to
archive my spam in case I needed to start again. (I already archived everthing
else for purposes of my profession). I have about 93K spam mails, and rather
more ham, archived.
Geoff
More information about the bogofilter
mailing list