database copying and compacting
Matthias Andree
matthias.andree at gmx.de
Mon Nov 15 23:24:53 CET 2004
"Pavel Kankovsky" <peak at argo.troja.mff.cuni.cz> writes:
> On Sun, 7 Nov 2004, David Relson wrote:
>
>> Now, with 0.93.0 it's necessary to save log files and use dd (with
>> proper block size) when copying the database. Thus copying becomes:
>>
>> SIZE=`db_stat -h $ORIG -d wordlist.db | grep "page size" | cut -f 1`
>> cp $SRC/log* $SRC/__db.* $DST
>> for FILE in $SRC/*.db ; do
>> dd bs=$SIZE if=$FILE of=$DST/`basename $FILE`
>> done
>
> According to Berkeley DB's "Database and log file archival"
> (http://www.sleepycat.com/docs/ref/transapp/archival.html)
> hot database backup (*) should copy db files BEFORE log files and the
> order is important.
Check the brand new db_tar script (in CVS only yet) - it does just
that. Tar up the databases (as returned by db_archive -s) and then the
logs (as returned by db_archive -l) to stdout - and optionally remove
unneeded log after or before (not recommended) the backup.
> The requirement makes sense to me: if you copy log
> files first and db files next, you might end with db files containing
> data from updates missing in log files and the result will be
> unrecoverable. On the other hand, if you copy db files first and log
> files next, all data in db files can be either commited or rolled back
> using the information in log files (at least unless you drop some log
> files in the middle of backup).
>
> Moreover, it might be a good idea to add something like
> db_recover -c -h $DST in order to put the destination db into a
> consistent state.
No. bogofilter -f $DST is possible, but db_recover MUST NOT be used on
live databases. Running recovery underneath a running application wreaks
real havoc like you've never seen before. I tried that on a copy of my
database to find out how bad it really was, and it was worse than I
expected. Boom.
> (*) I assume this is what you intend to do because dd is necessary to
> guarantee page-level read consistency wrt concurrent writes.
Yup. The actual problem was that on some systems, cp(1) used mmap(2)
which then goofed the isolation up and caused non-atomic reads. Half a
page new, the other half stale. It's half as bad when the database has
been created under BerkeleyDB 4.1 - 4.3 because we'll request page
checksums for these versions. 4.0 and older don't support checksums.
--
Matthias Andree
More information about the Bogofilter
mailing list