Is bogofilter Bayesian?

Tue Feb 10 13:25:00 CET 2004

On Tue, 10 Feb 2004 10:05:48 +0100
Boris 'pi' Piwinger wrote:

> Hi!
> 
> I just had some (private) discussion with Gary Robinson.
> 
> When reading is Linux Journal article (see pointer in the
> man page) about the Fisher method and the way to it, there
> are two things which don't fit our documentation.
> 
> 1) Is bogofilter Bayesian? It certainly uses those methods
> at some point, but not throughout the computation, in
> particular the Fisher method (using the chi function) would
> not be it. So he suggests to call such filters "adaptive" or
> "statistical", where the latter includes Bayesian.
> 
> 2) We have some discussion in the FAQ about general
> statistical assumptions in the context of train on error. As
> he discusses in his article, we don't want to really compute
> probabilities, but refute the assumption the null hypothesis
> that the message is just a random collection of independent
> words with  probabilities given by our estimates. For
> example, he makes the point that mails are not at all such a
> collection in a mail, simply because of the nature of
> language. So already there the assumption mentioned in the
> FAQ is broken. In his opinion the choice of messages for
> training on error also does no harm to this concept, hence
> the warning would be inappropriate.
> 
> See http://www.garyrobinson.net/2004/02/spam_filtering_.html
> for some details.
> 
> pi

Hi pi,

'Tis a nice write-up.  Well done!

David