ext3fs slowness -- how things proceed
Greg Louis
glouis at dynamicro.on.ca
Wed Feb 5 13:32:55 CET 2003
On 20030205 (Wed) at 0721:05 -0500, David Relson wrote:
> At 07:01 AM 2/5/03, Greg Louis wrote:
> At present, bogofilter creates an unsorted set of words. As it creates the
> word set, duplicates are discarded. It then goes through this unordered
> word set to compute spam scores. For each word bogofilter gets the count
> from the spamlist, then the good list, i.e. it alternates between
> wordlists. It seems like this is a worst case scenario for the database.
>
> At the moment, I'm thinking of a two part patch. First, sort the
> tokens. That will allow bogofilter to perform database access in an
> ordered manner. Second, do all the work for the spamlist, then for the
> goodlist. That should minimize cache needs.
>
> If I build it, will you test it?
Yes indeed.
Part 1 will also apply during registration, I presume? Part 2 as well,
come to think of it, with -N and -S? Or would it be better in that
case to sacrifice speed in favour of data integrity and move one token
at a time from one list to the other -- presorting will still help a
lot, I expect?
--
| G r e g L o u i s | gpg public key: |
| http://www.bgl.nu/~glouis | finger greg at bgl.nu |
| Help free our mailboxes. Include |
| http://wecanstopspam.org in your signature. |
More information about the bogofilter-dev
mailing list