Minimum usable counts [was: Question]
David Relson
relson at osagesoftware.com
Thu May 21 05:16:05 CEST 2009
On Thu, 21 May 2009 12:01:41 +0930
Stephen Davies wrote:
> I understand.
>
> My initial issue is with the obvious spams not being detected first
> time round.
> The first I see of them is in my inbox as ham - despite being so
> obviously spam.
>
> If I save the email and run it through bogofilter -vvv, I get the
> results I posted.
>
> I then use bogofilter -Ns to "fix" the database and this seems to
> work - until the next spam with the same pattern but from a different
> source arrives. (bogofilter -vvv at this stage gives bogosity of 1.0).
>
> I have changed my min-dev, robx and robs to 0.35, 0.7, 0.1 but first
> indications are that this is not enough.
...[snip]...
Hi Stephen,
'Tis an interesting idea to allow not scoring tokens whose spam and ham
counts are low. As an experiment, the attached patch for src/score.c
will ignore tokens for which good_count+bad_count<3. Give it a try and
let me know what you think of it.
Regards,
David
P.S. If the patch works for you, we'll need a good name for the
option. Any suggestions?
-------------- next part --------------
Index: score.c
===================================================================
--- score.c (revision 6816)
+++ score.c (working copy)
@@ -273,7 +273,11 @@
cnts = &props->cnts;
props->prob = calc_prob(cnts->good, cnts->bad,
cnts->msgs_good, cnts->msgs_bad);
- props->used = fabs(props->prob - EVEN_ODDS) > min_dev;
+ if ( ( fabs(props->prob - EVEN_ODDS) < min_dev ) ||
+ ( cnts->good + cnts->bad < 3 ) )
+ props->used = false;
+ else
+ props->used = true;
if (props->used)
count += 1;
}
More information about the bogofilter
mailing list