How big is too big for wordlist.db?
Matthias Andree
matthias at an3e.de
Sun Oct 10 23:09:03 CEST 2021
Am 09.10.21 um 16:25 schrieb hput:
> setup: Ubuntu-21.04
> Intel(R) Xeon(R) CPU X5470 @ 3.33GHz 32GB ram
>
> For a long time user of bogofilter yet quite a novice with any real
> skill in using it or adjusting... etc.
>
> Having a 27 MB wordlist.db seems to slow down mail processing to a
> some degree.
>
> ls -sh .bogofilter/
> total 27M
> 27M wordlist.db
>
>
> I'd like to know how big is really getting up there with wordlist.db?
> And the simplest way to reduce its size?
>
> I've only ever used bogofilter by calling it with ~/.procmailrc like
> so:
>
> :0fw
> | /usr/bin/bogofilter -p -l -u -e -v
>
> :0
> * ^X-Bogosity: (Spam|Yes)
> bogo_spam_tr.in
>
> :0
> * ^X-Bogosity: Unsure
> bogo-unsure.in
>
> I've never done any kind of training for bogofilter.
That's not useful; however in your configuration, procmail would call
bogofilter in a mode that if it has a clear "spam or good" decision, as
opposed to "unsure" for a message passing through, it will register the
message automatically. That causes the wordlist to grow quite a bit.
The slowdown might then be because only one bogofilter instance can
WRITE to the database at the time, and "-u" makes many bogofilter calls
write.
Other than that, access times should scale logarithmically with the
size, bogofilter is using B*Tree databases when you compile it with
Berkeley DB, but a multi-MB database in itself shouldn't hog the machine
down. Writing to spinning platter HDDs however is slow, so you may want
to remove the "-u" and train bogofilter by manually correcting
incorrectly classified or unsure e-mail only.
--
Matthias Andree
More information about the bogofilter
mailing list