the importance of robx

David Relson relson at osagesoftware.com
Sun Feb 29 02:21:13 CET 2004


On 28 Feb 2004 19:57:42 -0500
Tom Anderson wrote:

> On Sat, 2004-02-28 at 19:32, David Relson wrote:
> > cnt    rs    md     rx  cutoff  fp  fn
> >   1 1.0000 0.050 0.439 0.929204 23 1868
> >   2 1.0000 0.050 0.389 0.906199 23 1971
> >   3 1.0000 0.050 0.489 0.973740 23 1501
> >   4 1.0000 0.050 0.339 0.867208 23 2034
> >   5 1.0000 0.050 0.539 0.977234 23 1463
> 
> Your min_dev is very, very small.  The default in
> bogofilter.cf.example is 0.1.  I've increased mine to 0.2 in order to
> further depend only on tried-and-true tokens.  Your robx is effecting
> your classifications because it is further from 0.5 than your min_dev.
>  Notice your false
> negatives went way down when you set robx to 0.539, which is closer to
> 0.5 than min_dev.  Try changing min_dev to 0.2 and run the same test
> again.

Tom,

0.050 is _not_ my min_dev.  The 5 lines above are the first 5 lines of
bogotune's 225 line coarse scan.

Bogotune uses the following values for its coarse scan:

  rsval:  5  (1.0000, 0.3162, 0.1000, 0.0316, 0.0100)
  rxval:  5  (0.335, 0.385, 0.435, 0.485, 0.535)
  mdval:  9  (0.050, 0.100, 0.150, 0.200, 0.250, 0.300, 0.350, 0.400,
0.450)

The fine scan uses approx 100 to 250 values centered around the optimum
parameter set determined in the coarse scan.

> It's score is not robx, because robx is within the min_dev zone.  It
> has no score at all.  What spamicity does a message have if none of
> the tokens were used to classify it?

Try "bogofilter -v < /dev/null" to see the score of a message with no
tokens..

> > It's virtually impossible to have a message composed entirely of new
> > words.  After all, your email address appears in several tokens in
> > bogofilter's parsing :-)
> 
> If I set up an account for a new user and enable bogofilter in their
> ~/.procmailrc without generating any database, the initial database
> will be completely blank, thus all words will be new.

True.  But that's a starting condition, which is not typical.  Once a
message has been registered, tokens will start to be recognized.




More information about the Bogofilter mailing list