database corruption

David Relson relson at osagesoftware.com
Thu May 29 19:03:45 CEST 2003


At 11:48 AM 5/29/03, Fred Yankowski wrote:
>On Thu, May 29, 2003 at 03:02:37PM +0200, Matthias Andree wrote:
> > People who see that bogoutil -d or db_dump loops can try to rescue data
> > (no promises though) from their data bases with:
> >
> > cp goodlist.db goodlist.db.bak
> > db_dump -r goodlist.db >good.raw
> > db_load good.new.db <good.raw
> > rm -f goodlist.db
> > bogoutil -d good.new.db | bogoutil -l goodlist.db
> > rm -f good.new.db good.raw
>
>Thank you for that procedure.  I ran it over my spamlist.db and now
>db_verify reports no errors on it.
>
>But what is that penultimate step for?  That is, what good is this:
>
>         bogoutil -d good.new.db | bogoutil -l goodlist.db

As you already know this line dumps and loads the wordlist.  Since dump 
orders the tokens alphabetically, this creates a nice clean database - i.e. 
one of smallest size.

>I used db_dump to dump good.new.db and goodlist.db to raw data after
>the above.  Comparing that raw data with diff gives results like this:
>
>         6c6
>         <  01000000
>         ---
>         >  0100000041a43101
>         8c8
>         <  02000000
>         ---
>         >  0200000041a43101
>
>It looks like the db_dump / db_load sequence loses data that is
>recovered somehow by doing the "bogoutil -d" / "bogoutil -l" sequence.
>What is going on there?

I don't know what those values are.  How about a context diff, i.e. "diff 
-u", to give us some clues?

"bogoutil -d | bogoutil -l" doesn't create any additional symbols.  My 
guess is that db_dump is showing timestamps or other bookkeeping information.





More information about the Bogofilter mailing list