bogoutil (performance ?)

David Relson relson at osagesoftware.com
Wed May 28 17:17:10 CEST 2003


At 10:40 AM 5/28/03, T'aZ wrote:

>#db_verify goodlist.db
>db_verify: Out-of-order key, page 1449 item 79
>db_verify: Last item on page 694 sorted greater than parent entry
>db_verify: Last item on page 1010 sorted greater than parent entry
>db_verify: First item on page 694 sorted greater than parent entry
>db_verify: Page 694 linked twice
>db_verify: DB->verify: goodlist.db: DB_VERIFY_BAD: Database verification
>failed
>
> > Can you dump the wordlist, i.e. "bogoutil -d goodlist.db".  If that
> > works, the database is ok.
>
>ugh , scrolled correctly until words beginning with e , then restarted
>from letters b, rescrolling again , then restarting at b etc etc etc
>
>:( seems b0rked :'(
>
>iirc my first version was 0.11.something


You've got a broken database.  We'll probably never know why.  The locking 
problems were fixed AFAIK in 0.10.  How large a quantity of email do you 
deal with?  Which version of BerkeleyDB are you running?

Anyhow, you can try to recover data using db_dump or, if you have saved ham 
and spam, you can start over and train bogofilter with what you have saved.

A possible precaution would be to snapshot your wordlists 
periodically.  For example, on Sundays "cp -a $BOGOFILTER_DIR save.1" and 
on Mondays "cp -a $BOGOFILTER_DIR save.2" etc.  As part of the cron job, 
check word counts, e.g. "bogoutil -d list.db | wc -l".

You're the first person to report database corruption in a long 
time.  Hopefully it's a fluke and doesn't happen again.

David





More information about the Bogofilter mailing list