DB won't train, 'Unsure' about everything
Matthias Andree
matthias.andree at gmx.de
Mon Feb 17 21:49:11 CET 2025
Am 17.02.25 um 16:54 schrieb Adrian via bogofilter:
> Since my previous thread I've narrowed down the problem. I can now
> describe the specific issue in the hope someone can explain it.
>
> I created a wordlist.db from a text dump, and it doesn't work.
>
> $ bogofilter -s -v -B <any file>
> # 0 words, 0 messages
What is this "any file" that you give it? Does bogofilter understand
what file format it is? Did you give it an empty file? What version are
you looking at?
Possibly with lots of "-v" and maybe a few -x options? Maybe -vvvxbcdgu
will elucidate us all. For -B I will definitely want the "reader" bit, -xb.
The meanings of the debug flags for -x can be seen here
https://gitlab.com/bogofilter/bogofilter/-/blob/main/bogofilter/src/debug.h?ref_type=heads
if bogofilter/src/debug.h isn't handy.
> $ bogofilter -t -v -B <any file>
> <any file> U 0.520000
>
> db_verify says it's OK
>
> The source text dump looks OK, though it has a lot of non-ASCII such as
> AU<C2><F2> 0 1 20230305 (as displayed by less)
What is the encoding? There should be an .ENCODING token in the text dump.
Also, what are the spam and ham message counts? bogoutil -d
~/.bogofilter/wordlist | grep MSG_COUNT should tell you.
bogofilter only works properly if it has both ham and spam messages,
else the Bayesian maths won't work and bogofilter falls back to "Unsure".
> Should I chuck Berkeley DB and install the Sqlite bogofilter?
>
> And why should a text dump that loads without error result in a DB that
> doesn't work??!
You didn't show the bogoutil -l output, so I don't know. :-)
More information about the bogofilter
mailing list