Excessive memory usage: bug?
Peter Bishop
pgb at adelard.com
Mon Mar 14 13:27:11 CET 2005
On 10 Mar 2005 at 18:55, David Relson wrote:
> H'lo Juan,
>
> When registering a mailbox (like you're doing), bogofilter does the
> following:
>
> 1. create a master wordlist
> 2. read one message
> 3. convert it to a list of tokens
> 4. merge the new tokens with the master list
> 5. repeat steps 2-4 for all messages
> 6. update the database with the tokens of the master wordlist
>
> The above technique uses a fair amount of ram but minimizes the disk
> access for reading and writing the database.
Is there a problem with this approach?
If the master wordlist exceeds the available RAM, there might be an
"out of memory failure" once the database exceeds a certain size.
Even if this does not happen (e.g. because of virtual memeory) there
certainly could be a lot of virtual memory thrashing if the actual
available RAM size were exceeded.
An alternative method, which should work regardlless of memory size
is:
1. create an empty *token* list in RAM
2. read one message
3. convert it to a list of tokens
4. merge the new tokens with the token list
5 repeat 2 till 4 until some maximum token limit is reached
6 read the current token counts from the database
8 update the database tokens with the extra token counts
9 write the tokens back to the database
10 go back to step 1 until the mailbaox is empty
--
Peter Bishop
pgb at adelard.com
pgb at csr.city.ac.uk
_______________________________________________
Bogofilter mailing list
Bogofilter at bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter
More information about the Bogofilter
mailing list