question about token list sort

David Relson relson at osagesoftware.com
Wed Sep 28 00:16:26 CEST 2005


On Tue, 27 Sep 2005 14:22:19 -0400
Zhenyu Zhong wrote:

> Hi All,
>  I have a question. It seems that after a message comes in, tokens will be
> extracted and inserted into a token hash list. But after that it will do a
> sort on the hash list, I want to know why this sort is necessary. I don't
> think it helps for the later lookup in the DB. Am I missing something
> here? Can anyone help me out?

Hello Zhong,

Since BerkeleyDB maintains its key,value pairs in an ordered manner,
bogofilter's sorting of tokens helps performance.  In a linux
environment, if the wordlist is used a lot there's a good chance it'll
end up in the kernel's cache and the order of token lookup becomes
unimportant.

One place it really does matter is registering multiple messages.
Bogofilter sorts the tokens of each message and collates all the sorted
lists.  This allows one pass over the database when updating actually
happens.

HTH,

David
 



More information about the Bogofilter mailing list