memory usage

David Relson relson at osagesoftware.com
Fri Mar 11 13:41:03 CET 2005


Greetings,

Sparked by yesterday's question on memory usage, I decided to run a test!

With a ham mailbox of 641Mb (90,038 messages; 633,935 different tokens)
bogofilter used 124Mb ram.  With a spam mailbox of 1008Mb (209,611
messages; 2,599,130 different tokens), it used 323Mb ram.

The numbers indicate that memory usage is somewhat high, but is under
control, without leaks.

Taking a deeper look, each token has storage for its bytes (absolutely
necessary!) as well as 7 words for counts, frequencies, and spam
score.  Of these 7 words, the 2 for spam and ham counts are needed in
all bogofilter's modes (esp registering and scoring).  Scoring uses 2
words for message counts (to figure the probability) and a double (2
words) for the probability.  Since the message counts are only needed
for computing the probability (and not after that), the counts and the
probability appear to be definable with a union (thus saving 8 bytes
per token).  Also the frequency count is needed during registration,
but not scoring.  It appears that freq can also be in the union (saving
another 4 bytes).

I'm not 100% sure this will all work, but am experimenting and
testing.  There may be additional efficiencies possible.  Time will
tell :-)

Regards,

David

_______________________________________________
Bogofilter mailing list
Bogofilter at bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter



More information about the Bogofilter mailing list