ext3fs slowness -- how things proceed

Greg Louis glouis at dynamicro.on.ca
Wed Feb 5 13:32:55 CET 2003


On 20030205 (Wed) at 0721:05 -0500, David Relson wrote:
> At 07:01 AM 2/5/03, Greg Louis wrote:

> At present, bogofilter creates an unsorted set of words.  As it creates the 
> word set, duplicates are discarded.  It then goes through this unordered 
> word set to compute spam scores.  For each word bogofilter gets the count 
> from the spamlist, then the good list, i.e. it alternates between 
> wordlists.  It seems like this is a worst case scenario for the database.
> 
> At the moment, I'm thinking of a two part patch.  First, sort the 
> tokens.  That will allow bogofilter to perform database access in an 
> ordered manner.  Second, do all the work for the spamlist, then for the 
> goodlist.  That should minimize cache needs.
> 
> If I build it, will you test it?

Yes indeed.

Part 1 will also apply during registration, I presume?  Part 2 as well,
come to think of it, with -N and -S?  Or would it be better in that
case to sacrifice speed in favour of data integrity and move one token
at a time from one list to the other -- presorting will still help a
lot, I expect?

-- 
| G r e g  L o u i s          | gpg public key:      |
|   http://www.bgl.nu/~glouis |   finger greg at bgl.nu |
| Help free our mailboxes. Include                   |
|        http://wecanstopspam.org in your signature. |




More information about the bogofilter-dev mailing list