Better database??

Alain fvvvgyh02 at sneakemail.com
Mon Mar 29 20:39:17 CEST 2004


Hi

A question about a rather old post:

><michael <at> optusnet.com.au> wrote
>>David Relson <relson <at> osagesoftware.com> writes:
[..]
>> > In the end it was just too hard to do, so I reverted to 32
>> > bit counters.
>> 
>> I can see 16 bit counters getting messy.  You were wise to revert 
>
>I'm still bitter about it. I could chop 25% off the database size by doing
>it! :)
>Michael.

Are there special issues here? 

I thought of going to even 9-bit (or 10 bit) counters.  I think it's possible 
to get to 8byte's / token in a closed hash table with minimum 256K tokens with 
those small counters. 


The ratio should of course be kept ok, here some pseudocode :

unsigned ham  // # occurences inside ham
unsigned spam // # occurences inside spam
unsigned max // maximum allowed value

if (spam > ham)
{
  while (spam > max)
  {
     if (ham > 1)
     {
       spam -= (spam / ham);  // at least 1
       --ham;
     }
     else
     {
       spam = max;
     };
  }; 
}
else
{
  while (ham > max)
  {
     if (spam > 1)
     {
       ham -= (ham/spam);  // at least 1
       --spam;
     }
     else
     {
       ham = max;
     };
  }; 
};
  

Alain





More information about the Bogofilter mailing list