database corruption

David Relson relson at osagesoftware.com
Wed May 28 20:31:53 CEST 2003


At 02:20 PM 5/28/03, Dave Lovelace wrote:
>Fred Yankowski wrote:
> >
> > On Wed, May 28, 2003 at 12:27:49PM -0400, David Relson wrote:
> > > The noise indicates a problem.  I don't have solid info on what's
> > > significant and what's not.  One test would be simply to run 
> "bogoutil -d
> > > xxxlist.db".  Output is alphabetical and you could see how far it gets
> > > before it stops/loops/whatever...
> >
> > I tried that and it completes without reporting any error.  The last
> > entries start with the character displayed as umlauted 'y' and encoded
> > as 0x377, which seems pretty far into the collating sequence.
> >
>For me it seemed to be going on forever (on goodlist.db), so I killed
>it & started checking.  There are some things that seem to be out of
>order (but this might be result of embedded control characters), but it
>definitely starts looping & keeps it up forever until killed.
>
>Any suggestions on how to fix it without losing everything?

Dave,

No good answers for you.  My knowledge of BerkeleyDB is insufficient to 
give a definitive answer.  All I can do is make some suggestions.

1. Check the SleepyCat website, FAQs, etc ...
2. Use db_dump and see how much you good info you get.
3. Create a new goodlist.db from all the ham messages you have available.

The above are no particular order.  I wrote them as I thought of them.

Likely #3 is the easiest and quickest.  Likely it will also give you 
satisfactory results.

After any rebuild, careful monitoring of messages classifications would be 
a very good thing to do, along with correcting any mistakes.

Remember we all started using bogofilter with whatever messages we had 
available at the time.  Initial results may not have been the best 
possible, but we've all seen how bogofilter learns and improves.  I think 
you'll be fine.

David





More information about the Bogofilter mailing list