the importance of robx
David Relson
relson at osagesoftware.com
Sun Feb 29 02:21:13 CET 2004
On 28 Feb 2004 19:57:42 -0500
Tom Anderson wrote:
> On Sat, 2004-02-28 at 19:32, David Relson wrote:
> > cnt rs md rx cutoff fp fn
> > 1 1.0000 0.050 0.439 0.929204 23 1868
> > 2 1.0000 0.050 0.389 0.906199 23 1971
> > 3 1.0000 0.050 0.489 0.973740 23 1501
> > 4 1.0000 0.050 0.339 0.867208 23 2034
> > 5 1.0000 0.050 0.539 0.977234 23 1463
>
> Your min_dev is very, very small. The default in
> bogofilter.cf.example is 0.1. I've increased mine to 0.2 in order to
> further depend only on tried-and-true tokens. Your robx is effecting
> your classifications because it is further from 0.5 than your min_dev.
> Notice your false
> negatives went way down when you set robx to 0.539, which is closer to
> 0.5 than min_dev. Try changing min_dev to 0.2 and run the same test
> again.
Tom,
0.050 is _not_ my min_dev. The 5 lines above are the first 5 lines of
bogotune's 225 line coarse scan.
Bogotune uses the following values for its coarse scan:
rsval: 5 (1.0000, 0.3162, 0.1000, 0.0316, 0.0100)
rxval: 5 (0.335, 0.385, 0.435, 0.485, 0.535)
mdval: 9 (0.050, 0.100, 0.150, 0.200, 0.250, 0.300, 0.350, 0.400,
0.450)
The fine scan uses approx 100 to 250 values centered around the optimum
parameter set determined in the coarse scan.
> It's score is not robx, because robx is within the min_dev zone. It
> has no score at all. What spamicity does a message have if none of
> the tokens were used to classify it?
Try "bogofilter -v < /dev/null" to see the score of a message with no
tokens..
> > It's virtually impossible to have a message composed entirely of new
> > words. After all, your email address appears in several tokens in
> > bogofilter's parsing :-)
>
> If I set up an account for a new user and enable bogofilter in their
> ~/.procmailrc without generating any database, the initial database
> will be completely blank, thus all words will be new.
True. But that's a starting condition, which is not typical. Once a
message has been registered, tokens will start to be recognized.
More information about the Bogofilter
mailing list