Is bogofilter Bayesian?

Greg Louis glouis at dynamicro.on.ca
Wed Feb 11 12:45:25 CET 2004


On 20040211 (Wed) at 0947:07 +0100, Boris 'pi' Piwinger wrote:

> > Using Bayesian classification for email in the way we do, with full
> > training, violates two assumptions on which Bayesian classification is
> > based, namely, Bayesian classification would expect independence of
> > tokens within messages and uniform distribution of scores.  In
> > discussing training on error and training to exhaustion, we don't
> > mention that. 
> 
> Right. What do you think about an FAQ entry on all of those
> effects? That can list all concerns and possible
> interpretations.

I don't feel strongly about it.  On the one hand, it would perhaps be
read by a few people who wouldn't bother to follow the links to Paul's
and Gary's well-articulated descriptions.  On the other, the
information is available, in those papers and others, and repeating it
in the FAQ may be unnecessary duplication.  For myself, I'd prefer
linking to authoritative material; it's actually quite difficult to
explain this stuff in a way that both statisticians and
non-statisticians will find satisfactory.  Gary does a pretty good job.

-- 
| G r e g  L o u i s         | gpg public key: 0x400B1AA86D9E3E64 |
|  http://www.bgl.nu/~glouis |   (on my website or any keyserver) |
|  http://wecanstopspam.org in signatures helps fight junk email. |




More information about the Bogofilter mailing list