using fisher without robinson

Greg Louis glouis at dynamicro.on.ca
Wed Jan 1 01:52:38 CET 2003


On 20021231 (Tue) at 1328:38 -0500, David Relson wrote:
> At 01:09 PM 12/31/02, Graham Wilson wrote:
> 
> >is it possible to compile bogofilter with only the fisher code, i.e.
> >without the fisher or graham code?
> 
> Graham,
> 
> Fisher without graham, yes.  Fisher without robinson.c, no.  In object 
> oriented terms, fisher is a subclass of robinson.  In practical terms, 
> fisher is built upon the robinson method.  It takes the robinson result and 
> applies a chi-square test to determine the likelihood that the robinson 
> result is spam (given the number of tokens used in getting the robinson 
> result).  Leastways, as a non-statistician that's how I think it happens.
> 
Shouldn't be so, though.  There is no logical reason (developer
convenience excluded) why compiling for fisher alone should be
impossible.

There are two processes involved in calculating a spamicity value
(spamval henceforth).  The first is to come up with some sort of
probability estimate for each unique token in the message that meets
inclusion criteria.  This is Paul's p(w) and Gary's f(w).  The second
is to combine these values to yield an overall spamval.  Robinson-gm
does this by calculating the geometric means and combining those. 
Robinson-Fisher does it by applying Fisher's method for combining
probabilities to P(spam) and P(nonspam) and combining those.  Just
because both Robinson methods start by calculating Robinson's f(w)
values is no reason to force Fisher users to compile in the
geometric-mean baggage.  Not only Fisher is a subclass of Robinson;
Fisher and GM should both be subclasses, individually dispensible at
compile time, of Robinson.

Happy New Year all the same....... :)
-- 
| G r e g  L o u i s          | gpg public key:      |
|   http://www.bgl.nu/~glouis |   finger greg at bgl.nu |




More information about the Bogofilter mailing list