wordlist.db problem

Tom Allison tallison at tacocat.net
Fri Jun 18 12:27:06 CEST 2004


OTR Comm wrote:
> Hello,
> 
> I have just joined this list, and the list archive site was not working,
> so I couldn't check the archive for this problem.
> 
> What would make my wordlist.db file not update?
> 
> I have a message that is definitely spam and tried to train bogofilter
> on the database with:
> 
> bogofilter -d .bogofilter -s < spam/spam.31514
> 
> and the ran
> 
> bogofilter -u -e -p -d .bogofilter < spam/spam.31514
> 
> but the message still came back
> 
> X-Bogosity: No, tests=bogofilter, spamicity=0.520000, version=0.90.0
> 
> I have just started using bogofilter (within the past month), so I am
> not sure what is going on here.  That is, why the database didn't
> update.  The timestamp changed, so I know something happened.
> 
> Even when I tried bogofilter -d .bogofilter -Ns < spam/spam.31514, the
> database wasn't updated correctly.
> 
> 

It sounds to me that you expect bogofilter to learn the difference of 
spam/ham with clarity upon each single email update.  Bogofilter is a 
statistical learning process.  It's rarely going to learn with certainty 
the spamicity of a single email unless that email is already very near 
the cutoff limit.

You mentioned the timestamp changed, so I think the database changed.
To verify run:

bogoutil -d .bogofilter/wordlist.db | grep MSG_COUNT
or
bogoutil -w .bogofilter/wordlist.db
and end ".MSG_COUNT"
I get:  .MSG_COUNT                       7311  14807


Now, run some spam through using your favorite technique and check the 
MSG_COUNT again.  It should have changed.

I noticed that in your scripts you did two things:
first you told bogofilter that some email was indeed spam, then you told 
it to test and update the wordlist accordingly.  It should have updated 
the MSG_COUNT twice.

Also bear in mind that with robs and min_dev, not all tokens added to 
the database will have an immediate effect because it's too close to 
being "new" or "unsure"





More information about the Bogofilter mailing list