Patch for fewer string compares [was: Plan for performance improvement]

Adrian Otto aotto at aotto.com
Sat Sep 14 18:37:07 CEST 2002


Nice!

> -----Original Message-----
> From: David Relson [mailto:relson at osagesoftware.com]
> Sent: Friday, September 13, 2002 8:40 PM
> To: bogofilter-dev at aotto.com
> Subject: Patch for fewer string compares [was: Plan for performance
> improvement]
>
>
> Adrian,
>
> The number of strcmp() calls can be lowered significantly very
> easily.  Consider the following:
>
> If a new token _is_ the same as one in the array, then the new token's
> probability will equal a probability in the array.  So, when looping
> through the array, check first for equal probabilities and then for equal
> names.  This will be quicker, because the only times strcmp() is
> called is
> when the probabilities match, which is a precondition for names matching.
>
> Here's the patch:
>
> --- bogofilter-0.7.3.1/bogofilter.c~	Fri Sep 13 21:00:46 2002
> +++ bogofilter-0.7.3.1/bogofilter.c	Fri Sep 13 23:31:23 2002
> @@ -481,7 +481,7 @@
>   	for (pp = stats.extrema; pp <
> stats.extrema+sizeof(stats.extrema)/sizeof(*stats.extrema); pp++)
>           {
>   	    // don't allow duplicate tokens in the stats.extrema
> -	    if (pp->key && strcmp(pp->key, yytext)==0)
> +	    if (pp->key && pp->prob == prob && strcmp(pp->key, yytext)==0)
>               {
>                   hit=NULL;
>   		break;
>
> As part of my testing earlier today, I have printf() statements that tell
> me what's happening when new tokens are added to the array.  With
> the above
> patch, I get exactly the same sequence of tokens being added as I got
> before the patch.
>
> David
>
> P.S.  I had been thinking of this patch, but hadn't gotten around to it
> until your message spurred me to action :-)
>
>



More information about the bogofilter-dev mailing list