corrupted db files?

David Relson relson at osagesoftware.com
Tue Dec 31 17:53:11 CET 2002


Fletcher,

There's definitely something wrong.  A token's count should be less than 
the number of messages (MSG_COUNT) processed (if using the Robinson or 
Robinson-Fisher methods) or less than 4 times MSG_COUNT (if using Graham 
method).  Also, all calls to db_setvalue() check for negative values and 
replace them by 0.  You should never see huge values like you're reporting.

What version of bogofilter are you running?  If not the latest stable 
version, i.e. 0.9.1.2, you should upgrade for all the newest features and 
the best code.

If you start with a fresh database can you reproduce the problem?

David



At 11:24 AM 12/31/02, Fletcher Mattox wrote:

>Occasionally the word count values in my db files get very large.
>Many of them are near 2^32, perhaps suggesting integer overflow
>of some type.

... [snip] ...

>There probably are 3369 messages in this db, so maybe this is just
>an artifact of the way Berkeley DB works?  Should I be concerned?

I'm not sure what the upper limits are, but I've personally got 10x as many 
messages and am sure that BerkeleyDB can handle millions.  Likely the 
number is 2^32-1 or something like that.  I'm sure you're not bumping into 
a DB limit, but there is cause for concern.  Big numbers like that are WRONG.

David





More information about the Bogofilter mailing list