Is bogofilter Bayesian?

Tue Feb 10 10:05:48 CET 2004

Hi!

I just had some (private) discussion with Gary Robinson.

When reading is Linux Journal article (see pointer in the
man page) about the Fisher method and the way to it, there
are two things which don't fit our documentation.

1) Is bogofilter Bayesian? It certainly uses those methods
at some point, but not throughout the computation, in
particular the Fisher method (using the chi function) would
not be it. So he suggests to call such filters "adaptive" or
"statistical", where the latter includes Bayesian.

2) We have some discussion in the FAQ about general
statistical assumptions in the context of train on error. As
he discusses in his article, we don't want to really compute
probabilities, but refute the assumption the null hypothesis
that the message is just a random collection of independent
words with  probabilities given by our estimates. For
example, he makes the point that mails are not at all such a
collection in a mail, simply because of the nature of
language. So already there the assumption mentioned in the
FAQ is broken. In his opinion the choice of messages for
training on error also does no harm to this concept, hence
the warning would be inappropriate.

See http://www.garyrobinson.net/2004/02/spam_filtering_.html
for some details.

pi