Frequency of wordlist.db reorgs?
David Relson
relson at osagesoftware.com
Sun Oct 3 01:59:59 CEST 2004
On Sat, 2 Oct 2004 16:01:07 -0700 (PDT)
Charles Hewson wrote:
> Hi all,
> My wordlist grows about 10% each week. Current .MSG_COUNT
> spam 27000 ham 10000. If I do bogoutil -d .... |bogoutil -l ..... it
> reduces the disk from 4.21M to 2.30M. Logically this would cost some
> when tokens are added by bogofilter -u. Is this the best way to
> control disk usage? Should I make a weekly cron script? Would tracking
> output of db_stat give helpful input?
>
> Charles
Hi Charles,
My wordlist floats around 55-60MB. For 18 months or so, I used the "-u"
(autoupdate) option, so every message went into the database. Now I use
"-u" with "--thresh-update=0.01" so that easy ham and spam (those
scoring 0.01 or below and those scoring 0.99 and above) don't go into
the wordlist. Using thresh_update has slowed the rate of size
dramatically.
How often to compact the wordlist is a matter of personal preference.
AFAIK bogofilter's speed isn't noticeably affected by the wordlist's
disk layout.
I've not paid much attention to db_stat, though I just ran it on copies
of my wordlist from the past week. Offhand, I don't see any patterns
indicating anything useful :-<
HTH,
David
More information about the Bogofilter
mailing list