anomalous subjects

David Relson relson at osagesoftware.com
Tue Nov 30 18:34:31 CET 2004


On Tue, 30 Nov 2004 09:44:27 -0500
Tom Anderson wrote:

...[snip]...

 
> It seems my current one was severely truncated (from 1M tokens to 91k 
> tokens) at some point.  This may have nothing to do with bogofilter,
> as perhaps it became corrupted in another way, so there's no need to
> pursue this here.  I'm just going to continue training on error in
> order to rebuild my wordlist from my backup.
> 
> However, this may be of interest:  running "bogoutil -d | bogoutil
> -l", which was expected to compact the file, instead added 1.5M!  Is
> this normal? Has new information been added to the wordlist between
> 0.17.x and 0.92.8? Both versions pass db_verify.

The dump/load sequence doesn't change the number of tokens in the
wordlist.  Because dump writes tokens in cannonical order, piping the
dump output to load gives Berkeley DB the opportunity to build a
wordlist that's of minimal size.

To see what's happening with your wordlist, I'd dump both versions and
compare what you've got.

FWIW, I just ran this test:

OLD="/var/spool/bogofilter"
NEW="/tmp/test"
mkdir $NEW
bogoutil -d $OLD/wordlist.db | bogoutil -l $NEW/wordlist.db
ls -lh $OLD/wordlist.db $NEW/wordlist.db

with these results:

-rw-r-----  1 root root 60M Nov 30 12:24 $OLD/wordlist.db
-rw-r-----  1 root root 54M Nov 30 12:26 $NEW/wordlist.db

I've never seen dump/load produce a larger database..

HTH,

David

test]# procmail: [2413] Tue Nov 30 12:26:45 2004 p




More information about the Bogofilter mailing list