One list vs two (was Degeneration thought)

Greg Louis glouis at dynamicro.on.ca
Thu Jun 5 16:41:58 CEST 2003


On 20030605 (Thu) at 1504:19 +0100, Peter Bishop wrote:

> I suppose a combined wordlist *could* be more efficient in time 
> and maybe storage space (but this is not certain)

We have tested this in a couple of scenarios including a production
environment.  It is more efficient in time if an appropriate cache size
is used, and slightly more efficient in storage space.  The savings are
not huge in either case (database lookup isn't as time-consuming as the
lexical analysis, so saving a third of the lookup time only results in
about a 10% overall speedup even with big message or mbox files).  My
personal training db went from 32 Mb to 30 when I converted from two
lists to one.

-- 
| G r e g  L o u i s          | gpg public key: finger     |
|   http://www.bgl.nu/~glouis |   glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |




More information about the Bogofilter mailing list