extreme wierdness with RF & 0.10.0
David Relson
relson at osagesoftware.com
Tue Jan 21 04:53:04 CET 2003
At 10:28 PM 1/20/03, Barry Gould wrote:
>At 10:12 PM 1/20/2003 -0500, David Relson wrote:
>
>>What are the values of .MSG_COUNT? "bogofilter -w /path/to/wordlists
>>.MSG_COUNT" will give the info.
>
> spam good
>.MSG_COUNT 8252 31411
As I suspected. The token counts exceed the message counts.
Barry,
Patch is attached. Before patching run "make -s check" to verify that you
can build and run without trouble on your machine. Then patch. Then "make
-s check" to verify that there's been no regression. Then test for
"extreme weirdness".
I'll be up for another hour or so.
David
-------------- next part --------------
Index: robinson.c
===================================================================
RCS file: /cvsroot/bogofilter/bogofilter/robinson.c,v
retrieving revision 1.25
diff -u -r1.25 robinson.c
--- robinson.c 19 Jan 2003 15:02:03 -0000 1.25
+++ robinson.c 21 Jan 2003 03:47:14 -0000
@@ -107,14 +107,13 @@
static double wordprob_result(wordprob_t* wordstats)
{
- double fw, pw;
- double g = wordstats->good;
- double b = wordstats->bad;
+ double g = min(wordstats->good, msgs_good);
+ double b = min(wordstats->bad, msgs_bad);
double n = g + b;
- pw = (n < EPS) ? 0.0 : ((b / msgs_bad) /
- (b / msgs_bad + g / msgs_good));
- fw = (robs * robx + n * pw) / (robs + n);
+ double pw = (n < EPS) ? 0.0 : ((b / msgs_bad) /
+ (b / msgs_bad + g / msgs_good));
+ double fw = (robs * robx + n * pw) / (robs + n);
return (fw);
}
Index: rstats.c
===================================================================
RCS file: /cvsroot/bogofilter/bogofilter/rstats.c,v
retrieving revision 1.30
diff -u -r1.30 rstats.c
--- rstats.c 12 Jan 2003 14:02:00 -0000 1.30
+++ rstats.c 21 Jan 2003 03:47:14 -0000
@@ -219,17 +219,19 @@
rstats_t *cur = rstats_array[r];
const char *token = cur->token;
int len = max(0, MAXTOKENLEN-(int)strlen(token));
- double n = cur->good + cur->bad;
- double pw = ((n < EPS)
+ double g = min(cur->good, msgs_good);
+ double b = min(cur->bad, msgs_bad);
+ double n = g + b;
+ double pw = ((n < EPS)
? 0.0
- : (pw = (cur->bad / msgs_bad) /
- (cur->bad / msgs_bad + cur->good / msgs_good)));
+ : ((b / msgs_bad) /
+ (b / msgs_bad + g / msgs_good)));
double fw = (robs * robx + n * pw) / (robs + n);
char flag = (fabs(fw-EVEN_ODDS) - min_dev >= EPS) ? '+' : '-';
(void)fprintf(stdout, "\"%s\"%*s %5d %8.6f %8.6f %8.6f%10.5f%10.5f %c\n",
token, len, " ",
- (int)n, cur->good / msgs_good, cur->bad / msgs_bad,
+ (int)n, g / msgs_good, b / msgs_bad,
fw, log(1.0 - fw), log(fw), flag);
}
More information about the Bogofilter
mailing list