Bogofilter migration & tuneup
robin-lists at robinbowes.com
Mon Dec 5 09:31:48 EST 2005
Matthias Andree said the following on 05/12/2005 13:57:
> Robin Bowes <robin-lists at robinbowes.com> writes:
>>The thing is, wordlist.txt is currently around 4.7GB in size and
>>growing! The original wordlist.db is 105MB.
>>How can I reduce the size of the wordlist?
> The wordlist.db file is likely corrupt and looping. If you have log
> files for this wordlist.db, then running
> bogoutil.0.93 --db-recover=/path/to/.bogofilter.bak
> should fix this.
> If it does not, retry with --db-recover-harder or see doc/README.db for
> other recovery strategies.
Hmmm. So running bogoutil with --db-prune was not a good idea then? :(
I ran bogoutil with --db-recover anyway, and then with --db-recover-harder.
I'm now trying bogoutil -d wordlist.db > wordlist.txt again to see if it
grows massively again.
> My apologies if the option is actually named differently, I haven't
> looked at 0.93.5 in a while.
Me neither - it's just been sat there working for me!
>>One last thing, on the old machine, the .bogofilter directory "filled
>>up" with loads of DB log files. I'm not really interested in keeping all
>>of them. Is the correct way to keep these in check to use a cron task
>>running "bogoutil --db-prune" ?
> That would work with older bogofilter versions, you don't need this
> after the upgrade though: 1.0.0 removes logs files automatically if they
> are no longer of use (but can be configured to leave these behind if so
Ah, OK. That's a definite improvement.
> should we run the verify method by default before dumping, and if verify
> fails, either try recovery (on TXN) or request the user to use db_dump
> instead (on traditional)? This might be one non-bugfix item I'd be
> willing to let into 1.0 as it improves robustness when users upgrade
> from 1.0.X to 1.1 later.---Not that 1.1 were in sight though. :-)
Also, you should perhaps run the verify method before the --db-prune
option is run.
More information about the Bogofilter