wordlist.db problem
Tom Allison
tallison at tacocat.net
Fri Jun 18 12:27:06 CEST 2004
OTR Comm wrote:
> Hello,
>
> I have just joined this list, and the list archive site was not working,
> so I couldn't check the archive for this problem.
>
> What would make my wordlist.db file not update?
>
> I have a message that is definitely spam and tried to train bogofilter
> on the database with:
>
> bogofilter -d .bogofilter -s < spam/spam.31514
>
> and the ran
>
> bogofilter -u -e -p -d .bogofilter < spam/spam.31514
>
> but the message still came back
>
> X-Bogosity: No, tests=bogofilter, spamicity=0.520000, version=0.90.0
>
> I have just started using bogofilter (within the past month), so I am
> not sure what is going on here. That is, why the database didn't
> update. The timestamp changed, so I know something happened.
>
> Even when I tried bogofilter -d .bogofilter -Ns < spam/spam.31514, the
> database wasn't updated correctly.
>
>
It sounds to me that you expect bogofilter to learn the difference of
spam/ham with clarity upon each single email update. Bogofilter is a
statistical learning process. It's rarely going to learn with certainty
the spamicity of a single email unless that email is already very near
the cutoff limit.
You mentioned the timestamp changed, so I think the database changed.
To verify run:
bogoutil -d .bogofilter/wordlist.db | grep MSG_COUNT
or
bogoutil -w .bogofilter/wordlist.db
and end ".MSG_COUNT"
I get: .MSG_COUNT 7311 14807
Now, run some spam through using your favorite technique and check the
MSG_COUNT again. It should have changed.
I noticed that in your scripts you did two things:
first you told bogofilter that some email was indeed spam, then you told
it to test and update the wordlist accordingly. It should have updated
the MSG_COUNT twice.
Also bear in mind that with robs and min_dev, not all tokens added to
the database will have an immediate effect because it's too close to
being "new" or "unsure"
More information about the Bogofilter
mailing list