How big is too big for wordlist.db?

Matthias Andree matthias at an3e.de
Sun Oct 10 23:09:03 CEST 2021


Am 09.10.21 um 16:25 schrieb hput:
> setup: Ubuntu-21.04
> Intel(R) Xeon(R) CPU           X5470  @ 3.33GHz  32GB ram
> 
> For a long time user of bogofilter yet quite a novice with any real
> skill in using it or adjusting... etc.
> 
> Having a 27 MB wordlist.db seems to slow down mail processing to a
> some degree.
> 
>     ls -sh .bogofilter/
>   total 27M
>   27M wordlist.db
> 
> 
> I'd like to know how big is really getting up there with wordlist.db?
> And the simplest way to reduce its size?
> 
> I've only ever used bogofilter by calling it with ~/.procmailrc like
> so:
> 
>   :0fw
>   | /usr/bin/bogofilter -p -l -u -e -v
> 
>   :0
>   * ^X-Bogosity: (Spam|Yes)
>   bogo_spam_tr.in
> 
>   :0
>   * ^X-Bogosity: Unsure
>   bogo-unsure.in
> 
> I've never done any kind of training for bogofilter.

That's not useful; however in your configuration, procmail would call
bogofilter in a mode that if it has a clear "spam or good" decision, as
opposed to "unsure" for a message passing through, it will register the
message automatically.  That causes the wordlist to grow quite a bit.

The slowdown might then be because only one bogofilter instance can
WRITE to the database at the time, and "-u" makes many bogofilter calls
write.

Other than that, access times should scale logarithmically with the
size, bogofilter is using B*Tree databases when you compile it with
Berkeley DB, but a multi-MB database in itself shouldn't hog the machine
down.  Writing to spinning platter HDDs however is slow, so you may want
to remove the "-u" and train bogofilter by manually correcting
incorrectly classified or unsure e-mail only.

-- 
Matthias Andree


More information about the bogofilter mailing list