Excessive memory usage: bug?

Peter Bishop pgb at adelard.com
Mon Mar 14 13:27:11 CET 2005


On 10 Mar 2005 at 18:55, David Relson wrote:

> H'lo Juan,
> 
> When registering a mailbox (like you're doing), bogofilter does the
> following:
> 
>   1. create a master wordlist
>   2. read one message
>   3. convert it to a list of tokens
>   4. merge the new tokens with the master list
>   5. repeat steps 2-4 for all messages
>   6. update the database with the tokens of the master wordlist
> 
> The above technique uses a fair amount of ram but minimizes the disk
> access for reading and writing the database.

Is there a problem with this approach?
If the master wordlist exceeds the available RAM, there might be an 
"out of memory failure" once the database exceeds a certain size. 
Even if this does not happen (e.g. because of virtual memeory) there 
certainly could be a lot of virtual memory thrashing if the actual 
available RAM size were exceeded.

An alternative method, which should work regardlless of memory size 
is:

1. create an empty *token* list in RAM
2. read one message
3. convert it to a list of tokens
4. merge the new tokens with the token list
5  repeat 2 till 4 until some maximum token limit is reached
6  read the current token counts from the database
8  update the database tokens with the extra token counts
9  write the tokens back to the database
10  go back to step 1 until the mailbaox is empty
-- 
Peter Bishop 
pgb at adelard.com
pgb at csr.city.ac.uk


_______________________________________________
Bogofilter mailing list
Bogofilter at bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter



More information about the Bogofilter mailing list