Minimum usable counts [was: Question]

David Relson relson at osagesoftware.com
Thu May 21 05:16:05 CEST 2009


On Thu, 21 May 2009 12:01:41 +0930
Stephen Davies wrote:

> I understand.
> 
> My initial issue is with the obvious spams not being detected first
> time round.
> The first I see of them is in my inbox as ham - despite being so
> obviously spam.
> 
> If I save the email and run it through bogofilter -vvv, I get the
> results I posted.
> 
> I then use bogofilter -Ns to "fix" the database and this seems to
> work - until the next spam with the same pattern but from a different
> source arrives. (bogofilter -vvv at this stage gives bogosity of 1.0).
> 
> I have changed my min-dev, robx and robs to 0.35, 0.7, 0.1 but first 
> indications are that this is not enough.

...[snip]...

Hi Stephen,

'Tis an interesting idea to allow not scoring tokens whose spam and ham
counts are low.  As an experiment, the attached patch for src/score.c
will ignore tokens for which good_count+bad_count<3.  Give it a try and
let me know what you think of it.

Regards,

David

P.S.  If the patch works for you, we'll need a good name for the
option.  Any suggestions?
-------------- next part --------------
Index: score.c
===================================================================
--- score.c	(revision 6816)
+++ score.c	(working copy)
@@ -273,7 +273,11 @@
 	cnts  = &props->cnts;
 	props->prob = calc_prob(cnts->good, cnts->bad,
 				cnts->msgs_good, cnts->msgs_bad);
-	props->used = fabs(props->prob - EVEN_ODDS) > min_dev;
+	if ( ( fabs(props->prob - EVEN_ODDS) < min_dev ) ||
+	     ( cnts->good + cnts->bad < 3 ) )
+	    props->used = false;
+	else
+	    props->used = true;
 	if (props->used)
 	    count += 1;
     }


More information about the bogofilter mailing list