A weird wordlist.db problem

Tom Eastman tom at cs.otago.ac.nz
Fri Jun 10 05:17:17 CEST 2005


Here's an interesting one... bogofilter has worked beautifully for me for
years, but Berkeley always seems to be a nightmare.

Berkeley is some kind of tree structure, right?  Well it looks kind of like
I have a branch pointing back in to itself or something like that...

If I attempt a 'bogoutil -d wordlist.db', the output simply continues
forever, my wordlist.db is about five megabytes, but I killed the dump once
the dumpfile had reached half a *gigabyte*.

Doing something similar for a while, and then running 'sort | uniq -c -d'
showed large (LARGE) numbers of duplicate lines in the output file.

Clearly there is some kind of loop in the data structure causing bogoutil to
run forever.  I *like* my current wordlist, and things, oddly enough, still
seem to work, as far as learning and classification is concerned.

How can I fix this?  How can I recover my database to the point at least
where I can do a dump/reload and make it healthy again?

Thanks,

        Tom





More information about the Bogofilter mailing list