the importance of robx

Tom Anderson tanderso at oac-design.com
Sun Feb 29 01:57:42 CET 2004


On Sat, 2004-02-28 at 19:32, David Relson wrote:
> cnt    rs    md     rx  cutoff  fp  fn
>   1 1.0000 0.050 0.439 0.929204 23 1868
>   2 1.0000 0.050 0.389 0.906199 23 1971
>   3 1.0000 0.050 0.489 0.973740 23 1501
>   4 1.0000 0.050 0.339 0.867208 23 2034
>   5 1.0000 0.050 0.539 0.977234 23 1463

Your min_dev is very, very small.  The default in bogofilter.cf.example
is 0.1.  I've increased mine to 0.2 in order to further depend only on
tried-and-true tokens.  Your robx is effecting your classifications
because it is further from 0.5 than your min_dev.  Notice your false
negatives went way down when you set robx to 0.539, which is closer to
0.5 than min_dev.  Try changing min_dev to 0.2 and run the same test
again.

> > That brings up the question... what happens if you have a message
> > composed entirely of new words?
> 
> It's score is robx.  Robx should be less than spam_cutoff so that the
> message is considered ham, not spam.  This is desirable because it's

It's score is not robx, because robx is within the min_dev zone.  It has
no score at all.  What spamicity does a message have if none of the
tokens were used to classify it?

> It's virtually impossible to have a message composed entirely of new
> words.  After all, your email address appears in several tokens in
> bogofilter's parsing :-)

If I set up an account for a new user and enable bogofilter in their
~/.procmailrc without generating any database, the initial database will
be completely blank, thus all words will be new.

In addition, tokens such as your email address ought to be within the
min_dev zone... therefore, all tokens other than neutral ones may be new
even with an established database.

Tom


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040228/99ec5127/attachment.sig>


More information about the Bogofilter mailing list