Is bogofilter Bayesian?

Greg Louis glouis at dynamicro.on.ca
Tue Feb 10 13:23:46 CET 2004


On 20040210 (Tue) at 1005:48 +0100, Boris 'pi' Piwinger wrote:

> 1) Is bogofilter Bayesian? It certainly uses those methods
> at some point, but not throughout the computation, in
> particular the Fisher method (using the chi function) would
> not be it. So he suggests to call such filters "adaptive" or
> "statistical", where the latter includes Bayesian.

Strictly speaking, the "Bayesian" spam filters accept the assumptions
on which Bayesian statistical methodology is based, even though email
grossly violates those assumptions.  IIRC Gary at one point felt (I
don't know if he still does) that the violations in question
(principally non-independence of tokens within messages) might in fact
help rather than hinder discrimination.  At any rate, the filters
(bogofilter included) are as Bayesian as nut bread is nuts (I happen to
have a loaf of nut bread in the bread machine at the moment, hence the
comparison) -- the term derives from a component that's responsible for
the principal flavour ;)

> In his opinion the choice of messages for
> training on error also does no harm to this concept, hence
> the warning would be inappropriate.

I'd be surprised if he were to confirm that your interpretation of his
opinion is accurate here.  The fact that we can't avoid violating the
assumptions on which Bayesian classification is based is no reason
deliberately to multiply such violation, nor does it offer any
protection from worsening accuracy by such multiplication.  However,
Gary pointed out in his first paper that pristine Bayesian validity is
unachievable, and you and I have agreed, on this list, with his
principle that what works matters more than what's statistically pure.
That said, the question becomes moot; what matters is whether your
methods give good results in other hands than your own, and how well
they scale.

I prefer, however, not to (mis)lead people into thinking it doesn't
matter what you train with, nor how many times you do so; I would be
sorry to see the warning removed.

-- 
| G r e g  L o u i s         | gpg public key: 0x400B1AA86D9E3E64 |
|  http://www.bgl.nu/~glouis |   (on my website or any keyserver) |
|  http://wecanstopspam.org in signatures helps fight junk email. |




More information about the Bogofilter mailing list