Better database??
Alain
fvvvgyh02 at sneakemail.com
Mon Mar 29 20:39:17 CEST 2004
Hi
A question about a rather old post:
><michael <at> optusnet.com.au> wrote
>>David Relson <relson <at> osagesoftware.com> writes:
[..]
>> > In the end it was just too hard to do, so I reverted to 32
>> > bit counters.
>>
>> I can see 16 bit counters getting messy. You were wise to revert
>
>I'm still bitter about it. I could chop 25% off the database size by doing
>it! :)
>Michael.
Are there special issues here?
I thought of going to even 9-bit (or 10 bit) counters. I think it's possible
to get to 8byte's / token in a closed hash table with minimum 256K tokens with
those small counters.
The ratio should of course be kept ok, here some pseudocode :
unsigned ham // # occurences inside ham
unsigned spam // # occurences inside spam
unsigned max // maximum allowed value
if (spam > ham)
{
while (spam > max)
{
if (ham > 1)
{
spam -= (spam / ham); // at least 1
--ham;
}
else
{
spam = max;
};
};
}
else
{
while (ham > max)
{
if (spam > 1)
{
ham -= (ham/spam); // at least 1
--spam;
}
else
{
ham = max;
};
};
};
Alain
More information about the Bogofilter
mailing list