Frequency of wordlist.db reorgs?

David Relson relson at osagesoftware.com
Sun Oct 3 01:59:59 CEST 2004


On Sat, 2 Oct 2004 16:01:07 -0700 (PDT)
Charles Hewson wrote:

> Hi all,
> 	My wordlist grows about 10% each week. Current .MSG_COUNT
> spam 27000 ham 10000. If I do bogoutil -d .... |bogoutil -l ..... it
> reduces the disk from 4.21M to 2.30M. Logically this would cost some
> when tokens are added by bogofilter -u. Is this the best way to
> control disk usage? Should I make a weekly cron script? Would tracking
> output of db_stat give helpful input?
> 
> Charles

Hi Charles,

My wordlist floats around 55-60MB.  For 18 months or so, I used the "-u"
(autoupdate) option, so every message went into the database.  Now I use
"-u" with "--thresh-update=0.01" so that easy ham and spam (those
scoring 0.01 or below and those scoring 0.99 and above) don't go into
the wordlist.  Using thresh_update has slowed the rate of size
dramatically.

How often to compact the wordlist is a matter of personal preference.
AFAIK bogofilter's speed isn't noticeably affected by the wordlist's
disk layout.

I've not paid much attention to db_stat, though I just ran it on copies
of my wordlist from the past week.  Offhand, I don't see any patterns
indicating anything useful :-<

HTH,

David



More information about the Bogofilter mailing list