extreme wierdness with RF & 0.10.0

David Relson relson at osagesoftware.com
Tue Jan 21 04:53:04 CET 2003


At 10:28 PM 1/20/03, Barry Gould wrote:

>At 10:12 PM 1/20/2003 -0500, David Relson wrote:
>
>>What are the values of .MSG_COUNT?  "bogofilter -w /path/to/wordlists 
>>.MSG_COUNT" will give the info.
>
>                        spam   good
>.MSG_COUNT             8252  31411

As I suspected.  The token counts exceed the message counts.


Barry,

Patch is attached.  Before patching run "make -s check" to verify that you 
can build and run without trouble on your machine.  Then patch.  Then "make 
-s check" to verify that there's been no regression.  Then test for 
"extreme weirdness".

I'll be up for another hour or so.

David

-------------- next part --------------
Index: robinson.c
===================================================================
RCS file: /cvsroot/bogofilter/bogofilter/robinson.c,v
retrieving revision 1.25
diff -u -r1.25 robinson.c
--- robinson.c	19 Jan 2003 15:02:03 -0000	1.25
+++ robinson.c	21 Jan 2003 03:47:14 -0000
@@ -107,14 +107,13 @@
 
 static double wordprob_result(wordprob_t* wordstats)
 {
-    double fw, pw;
-    double g = wordstats->good;
-    double b = wordstats->bad;
+    double g = min(wordstats->good, msgs_good);
+    double b = min(wordstats->bad, msgs_bad);
     double n = g + b;
 
-    pw = (n < EPS) ? 0.0 : ((b / msgs_bad) / 
-			    (b / msgs_bad + g / msgs_good));
-    fw = (robs * robx + n * pw) / (robs + n);
+    double pw = (n < EPS) ? 0.0 : ((b / msgs_bad) / 
+				   (b / msgs_bad + g / msgs_good));
+    double fw = (robs * robx + n * pw) / (robs + n);
 
     return (fw);
 }
Index: rstats.c
===================================================================
RCS file: /cvsroot/bogofilter/bogofilter/rstats.c,v
retrieving revision 1.30
diff -u -r1.30 rstats.c
--- rstats.c	12 Jan 2003 14:02:00 -0000	1.30
+++ rstats.c	21 Jan 2003 03:47:14 -0000
@@ -219,17 +219,19 @@
 	rstats_t *cur = rstats_array[r];
 	const char *token = cur->token;
 	int len = max(0, MAXTOKENLEN-(int)strlen(token));
-	double n = cur->good + cur->bad;
-	double pw = ((n < EPS)
+	double g = min(cur->good, msgs_good);
+	double b = min(cur->bad, msgs_bad);
+	double n = g + b;
+	double pw = ((n < EPS) 
 		     ? 0.0
-		     : (pw = (cur->bad / msgs_bad) /
-			(cur->bad / msgs_bad + cur->good / msgs_good)));
+		     : ((b / msgs_bad) /
+			(b / msgs_bad + g / msgs_good)));
 	double fw = (robs * robx + n * pw) / (robs + n);
 	char flag = (fabs(fw-EVEN_ODDS) - min_dev >= EPS) ? '+' : '-';
 
 	(void)fprintf(stdout, "\"%s\"%*s %5d  %8.6f  %8.6f  %8.6f%10.5f%10.5f %c\n",
 		      token, len, " ",
-		      (int)n, cur->good / msgs_good, cur->bad / msgs_bad, 
+		      (int)n, g / msgs_good, b / msgs_bad, 
 		      fw, log(1.0 - fw), log(fw), flag);
     }
 



More information about the Bogofilter mailing list