database corruption
David Relson
relson at osagesoftware.com
Thu May 29 19:03:45 CEST 2003
At 11:48 AM 5/29/03, Fred Yankowski wrote:
>On Thu, May 29, 2003 at 03:02:37PM +0200, Matthias Andree wrote:
> > People who see that bogoutil -d or db_dump loops can try to rescue data
> > (no promises though) from their data bases with:
> >
> > cp goodlist.db goodlist.db.bak
> > db_dump -r goodlist.db >good.raw
> > db_load good.new.db <good.raw
> > rm -f goodlist.db
> > bogoutil -d good.new.db | bogoutil -l goodlist.db
> > rm -f good.new.db good.raw
>
>Thank you for that procedure. I ran it over my spamlist.db and now
>db_verify reports no errors on it.
>
>But what is that penultimate step for? That is, what good is this:
>
> bogoutil -d good.new.db | bogoutil -l goodlist.db
As you already know this line dumps and loads the wordlist. Since dump
orders the tokens alphabetically, this creates a nice clean database - i.e.
one of smallest size.
>I used db_dump to dump good.new.db and goodlist.db to raw data after
>the above. Comparing that raw data with diff gives results like this:
>
> 6c6
> < 01000000
> ---
> > 0100000041a43101
> 8c8
> < 02000000
> ---
> > 0200000041a43101
>
>It looks like the db_dump / db_load sequence loses data that is
>recovered somehow by doing the "bogoutil -d" / "bogoutil -l" sequence.
>What is going on there?
I don't know what those values are. How about a context diff, i.e. "diff
-u", to give us some clues?
"bogoutil -d | bogoutil -l" doesn't create any additional symbols. My
guess is that db_dump is showing timestamps or other bookkeeping information.
More information about the Bogofilter
mailing list