comments from a new user

Greg Louis glouis at dynamicro.on.ca
Sat May 10 13:57:50 CEST 2003


On 20030509 (Fri) at 2313:08 -0400, Andrew Pimlott wrote:
> On Fri, May 09, 2003 at 09:08:02PM -0400, David Relson wrote:
> > At 08:34 PM 5/9/03, Andrew Pimlott wrote:
> > >- The FAQ has an obviously wrong explanation of pgood and pbad.  It calls
> > >  pgood the "likelihood that a message containing this token is non-spam"
> > >  when (I think) it means the "likelihood that a non-spam message contains
> > >  this token".
> > 
> > This is debatable.  What's presently in the FAQ isn't great, but the 
> > correct wording isn't obvious.
> 
> According to the FAQ, pgood and pbad must trivially add up to 1,
> which is clearly not true.  :-)

You are incorrect here.  "The likelihood that a message containing this
token is spam" can be, say, 0.1, and the likelihood that a message
containing this same token is nonspam can be 0.05.  There is nothing in
the description (which I didn't like when I first saw it, but which in
fact is more accurate than any other we've come up with) that implies
that the two should add to 1.  You mistakenly suppose that pgood and
pbad are probabilities; the choice of the word "likelihood" is
deliberate, and is meant to imply that they represent only very rough
estimates of probability based on limited information.

>     "the likelihood (extrapolated from your registered non-spam
>     messages) that a non-spam message contains this token"

That would be extremely inaccurate.  pgood tells us nothing about the
likelihood that a nonspam message contains the token; it addresses the
likelihood that a message containing the token is nonspam.  There may
be, and likely are, millions of other nonspams that do not contain the
token, and pgood has no bearing on that number.

-- 
| G r e g  L o u i s          | gpg public key: finger     |
|   http://www.bgl.nu/~glouis |   glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |




More information about the Bogofilter mailing list