Patch for fewer string compares [was: Plan for performance improvement]

David Relson relson at osagesoftware.com
Sat Sep 14 05:39:36 CEST 2002


<x-flowed>
Adrian,

The number of strcmp() calls can be lowered significantly very 
easily.  Consider the following:

If a new token _is_ the same as one in the array, then the new token's 
probability will equal a probability in the array.  So, when looping 
through the array, check first for equal probabilities and then for equal 
names.  This will be quicker, because the only times strcmp() is called is 
when the probabilities match, which is a precondition for names matching.

Here's the patch:

--- bogofilter-0.7.3.1/bogofilter.c~	Fri Sep 13 21:00:46 2002
+++ bogofilter-0.7.3.1/bogofilter.c	Fri Sep 13 23:31:23 2002
@@ -481,7 +481,7 @@
  	for (pp = stats.extrema; pp < 
stats.extrema+sizeof(stats.extrema)/sizeof(*stats.extrema); pp++)
          {
  	    // don't allow duplicate tokens in the stats.extrema
-	    if (pp->key && strcmp(pp->key, yytext)==0)
+	    if (pp->key && pp->prob == prob && strcmp(pp->key, yytext)==0)
              {
                  hit=NULL;
  		break;

As part of my testing earlier today, I have printf() statements that tell 
me what's happening when new tokens are added to the array.  With the above 
patch, I get exactly the same sequence of tokens being added as I got 
before the patch.

David

P.S.  I had been thinking of this patch, but hadn't gotten around to it 
until your message spurred me to action :-)

</x-flowed>



More information about the bogofilter-dev mailing list