Artificial Intelligence

Jonathan Buzzard jonathan at buzzard.org.uk
Thu Sep 19 23:32:53 CEST 2002


aotto at aotto.com said:
> 3) Microsoft holds a patent for a Bayesian spam filtering system.
> Although bogofilter does not fall under the terms of this patent, the
> argument would be much more simple if we simply did not use the word
> "Bayesian" to describe the system. 

I am no fan of software patents, but this is the absolute worst kind of
patent that exists. There is no inventive step whatsoever in this patent.
Firstly they acknowledge in the patent that spam is a textual classification
problem and they note at least one existing paper that says so.

They then go onto to list the range of features that might be extracted
from the email for classification purposes. Nothing new here either,
rule based classifiers do that. They describe how the classifier can
be trained on training set and this can be updated on the basis of new
material. Well blow me down if those working with machine learning
classifiers have not been doing that from day one.

Finally they go on to list a list a range of pre-existing Bayesian
classifiers and other classifiers including neural networks,
decision trees, basically just about every known sort of classifier
in existance (well all the ones I know about anyway). They even
mention using multiple classifiers and combining their outputs.
Heavens above I was doing that back in 1993 and it was not new then
by any stretch of the imagination.

However the patent then goes on to describes a Support Vector Machine
classifier. I am not a Bayesian expert, but I believe that the
classifier described is not a Bayesian one.

In a last gasp the patent goes no to say that their supposed inventive
classifier can be used to classify any electronic message.

This patent really is a perfect example of all that is wrong with the
patent system in the U.S.A. All that has been done is a set of standard
tools that those in field of machine classification have been using
for years, including textual classification, as well as machine vision,
and other areas and, said it will be good for classifying spam and been
granted a patent.

I cannot believe that even trivial searches of the appropriate journals
will not turn up sufficient prior art of textual classification to make
this patent null and void. In fact just reading the paper titles in
the "Other references" section one is astounded that these don't
show sufficient prior art, in particular

   M. Iwayama et al, "Hierarchical Bayesian Clustering for Automatic Text
   Classification", Natural Language, 1995.
   Thorsten Joachims, "Text Categorization with Support Vector Machines:
   Learning with Many Relevant Features", LS-8

Would appear on the fact of it to blow the patent out of the water. I
have not read them but the titles suggest these are amount the first
places to go.

In fact in a 1998 paper they acknowledge that they are not the first
to apply automatic machine learning classifiers to classifying email,
and cite a 1996 paper that concentrated on classifying email into
flame/none flame. As you could use bogofilter for exactly this purpose
we are home and dry if you ask me.

I suspect that Microsoft know they are on shaky ground, and provided you
don't implement the specific SVM that they mention in the patent they
will not do anything. Far to much bad press and they are likely to loose.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan at buzzard.org.uk
Northumberland, United Kingdom.       Tel: +44(0)1661-832195



For summay digest subscription: bogofilter-digest-subscribe at aotto.com



More information about the Bogofilter mailing list