Converting old wordlist.db to Berkeley format?

Geoff capsthorne at yahoo.co.uk
Mon Sep 5 20:41:36 CEST 2016


On Mon, 5 Sep 2016 19:15:55 +0100
RW <rwmaillists at googlemail.com> wrote:
<snip>

> IIWY I'd train a new wordlist  on 2000 recent spams and 2000 recent
> hams  and then run trainbogo.sh on the larger corpus with something
> like:
> 
> ham_cutoff = 0.001
> spam_cutoff= 0.99
> 
> trainbogo.sh does a train-on-error pass through the corpus, which is
> much faster than training on everything.
> 
> Old, very heavily trained wordlists are not always optimal. They may
> contain a lot of training that's no longer relevant, and they can
> become over-trained and resistant to change. 

Thanks for that.  I keep the archives in files of 1K emails, so it would be easy
to do that.  The position was that for a long time all of my work and
personal email passed through my main computer here at home, and so volumes were
high. The work arrangement has changed now, so that there is (relatively),
little traffic. Not to mention that so many linux project mailing lists are so
much quieter than once they were.  My current wordlist works very well for my
current needs, but I will bear the advice in mind.

Geoff


More information about the bogofilter mailing list