database copying and compacting

Matthias Andree matthias.andree at gmx.de
Sat Nov 13 00:38:22 CET 2004


David Relson <relson at osagesoftware.com> writes:

> Matthias,
>
> Sanity check please :-)
>
> The copying and compacting of databases has gotten more complex with the
> new release.  BerkeleyDB's Transaction capability generates log files
> which need to be included when copying and compacting.  I suspect I'll
> have db_copy and db_compact scripts before much longer.  Before I go too
> far in that direction, I wanted to check my understanding with you.
>
> 1) With 0.92.8, database copying was as simple as:
>
>     cp $ORIG/wordlist.db $NEW/

> Now, with 0.93.0 it's necessary to save log files and use dd (with
> proper block size) when copying the database.  Thus copying becomes:
>
>    SIZE=`db_stat -h $ORIG -d wordlist.db | grep "page size" | cut -f 1` 
>    cp $SRC/log* $SRC/__db.* $DST  
>    for FILE in $SRC/*.db ; do
>        dd bs=$SIZE if=$FILE of=$DST/`basename $FILE`
>    done
>
> The for loop supports multiple databases, e.g. wordlist.db and
> ignore.db, in $SRC

Right. Only you'd run SIZE inside the loop, with quoting:

    set -e
    cp "$SRC"/log.* "$SRC"/__db.* "$DST"
    for FILE in "$SRC"/*.db ; do
        SIZE=`db_stat -d "$FILE" | grep "page size" | cut -f 1` 
        dd bs=$SIZE if="$FILE" of="$DST"/`basename "$FILE"`
    done

The dd is there so that the database can be recovered if the copying
process fails.

I'd think if you don't want the logs, you can do this instead:

    set -e
    db_checkpoint -1h "$SRC"
    for FILE in "$SRC"/*.db ; do
        SIZE=`db_stat -d "$FILE" | grep "page size" | cut -f 1` 
        dd bs=$SIZE if="$FILE" of="$DST"/`basename "$FILE"`
    done

That's it. The environment will be recreated from scratch in the new
location, "$DST".

> 2) With 0.92.8, database compacting (with backup) looked like:
>
>     bogoutil -d $ORIG/wordlist.db > $NEW/wordlist.txt
>     bogoutil -l $NEW/wordlist.db  < $NEW/wordlist.txt
>     mv $ORIG/wordlist.db $ORIG/wordlist.db.orig
>     mv -f $NEW/wordlist.db $ORIG/wordlist.db

In traditional mode. For concurrent mode, the same procedure as for
transactional mode applies.

> Now, with 0.93.0 it's:
>
>     bogoutil -d $ORIG/wordlist.db > $NEW/wordlist.txt
>     bogoutil -l $NEW/wordlist.db  < $NEW/wordlist.txt
>     mv $ORIG/wordlist.db $ORIG/wordlist.db.orig
>     mv -f $NEW/wordlist.db $ORIG/wordlist.db

What's this good for? You can't just copy a wordlist.db file into an
existing environment, it'll get confused and I'd rather not see the
consequences. Just omit the mv.

The rest isn't necessary.

>     cd $NEW
>     db_checkpoint -1 -h .
>     rm -f `db_archive -h .`

You've moved the database file out of the directory, so these operations
are pointless.

How about:

     mkdir $NEW
     cp $ORIG/DB_CONFIG $NEW || true
     bogoutil -d $ORIG/wordlist.db > $NEW/wordlist.txt
     bogoutil -l $NEW/wordlist.db  < $NEW/wordlist.txt
     rm $NEW/wordlist.txt
     db_checkpoint -1h $NEW
     db_archive -dh $NEW
     mv $ORIG $ORIG.old
     mv $NEW $ORIG

> As I understand it, db_checkpoint ensures that the log file contents are
> included in the database (as far as possible) and db_archive lists log
> files that can be deleted.

True. db_archive -d will remove them.

-- 
Matthias Andree



More information about the Bogofilter mailing list