A weird wordlist.db problem
Tom Eastman
tom at cs.otago.ac.nz
Fri Jun 10 05:17:17 CEST 2005
Here's an interesting one... bogofilter has worked beautifully for me for
years, but Berkeley always seems to be a nightmare.
Berkeley is some kind of tree structure, right? Well it looks kind of like
I have a branch pointing back in to itself or something like that...
If I attempt a 'bogoutil -d wordlist.db', the output simply continues
forever, my wordlist.db is about five megabytes, but I killed the dump once
the dumpfile had reached half a *gigabyte*.
Doing something similar for a while, and then running 'sort | uniq -c -d'
showed large (LARGE) numbers of duplicate lines in the output file.
Clearly there is some kind of loop in the data structure causing bogoutil to
run forever. I *like* my current wordlist, and things, oddly enough, still
seem to work, as far as learning and classification is concerned.
How can I fix this? How can I recover my database to the point at least
where I can do a dump/reload and make it healthy again?
Thanks,
Tom
More information about the Bogofilter
mailing list