Running 0.11.1

David Relson relson at osagesoftware.com
Wed Mar 5 14:32:25 CET 2003


At 05:12 AM 3/5/03, Boris 'pi' Piwinger wrote:

>Hi!
>
>I just installed the newest version. Due to the changes I
>decided to rebuild my database. Clearly, this reduces the
>ham training base (since only some mails go to that database
>by my mail handling). But I save all my spam. So I am
>surprised by the result:
>
>Before:
>11198464 Mar  5 10:43 goodlist.db
>  4120576 Mar  5 10:43 spamlist.db
>
>After:
>  6623232 Mar  5 10:52 goodlist.db
>  2695168 Mar  5 10:52 spamlist.db
>
>I understand that goodlist becomes smaller, but that much?
>Why does spamlist go down that dramatically?

Wordlist sizes vary and I can't tell you for certain why your new one is 
smaller than the old one, but I do have a theory ...

One of the recent efficiency changes in bogofilter is that it does its own 
token sort before reading or writing the wordlist.  This allows BerkeleyDB 
to make much better use of its cache, i.e. to be much faster.  With already 
sorted keys, BerkeleyDB writes a more compact database (presumably because 
it doesn't have to split data block).

Likely the smaller size is due to the above factor.  If in the past you 
used -S and -N to move messages from one wordlist to the other, the old 
wordlists might have had tokens with counts of 0.  These tokens wouldn't be 
in the new (properly built) wordlists.







More information about the Bogofilter mailing list