using fisher without robinson

David Relson relson at osagesoftware.com
Wed Jan 1 01:57:23 CET 2003


At 07:52 PM 12/31/02, Greg Louis wrote:

>On 20021231 (Tue) at 1328:38 -0500, David Relson wrote:
> > At 01:09 PM 12/31/02, Graham Wilson wrote:
> >
> > >is it possible to compile bogofilter with only the fisher code, i.e.
> > >without the fisher or graham code?
> >
> > Graham,
> >
> > Fisher without graham, yes.  Fisher without robinson.c, no.  In object
> > oriented terms, fisher is a subclass of robinson.  In practical terms,
> > fisher is built upon the robinson method.  It takes the robinson result 
> and
> > applies a chi-square test to determine the likelihood that the robinson
> > result is spam (given the number of tokens used in getting the robinson
> > result).  Leastways, as a non-statistician that's how I think it happens.
> >
>Shouldn't be so, though.  There is no logical reason (developer
>convenience excluded) why compiling for fisher alone should be
>impossible.
>
>There are two processes involved in calculating a spamicity value
>(spamval henceforth).  The first is to come up with some sort of
>probability estimate for each unique token in the message that meets
>inclusion criteria.  This is Paul's p(w) and Gary's f(w).  The second
>is to combine these values to yield an overall spamval.  Robinson-gm
>does this by calculating the geometric means and combining those.
>Robinson-Fisher does it by applying Fisher's method for combining
>probabilities to P(spam) and P(nonspam) and combining those.  Just
>because both Robinson methods start by calculating Robinson's f(w)
>values is no reason to force Fisher users to compile in the
>geometric-mean baggage.  Not only Fisher is a subclass of Robinson;
>Fisher and GM should both be subclasses, individually dispensible at
>compile time, of Robinson.
>
>Happy New Year all the same....... :)

Premature optimization.  Memory is cheap - pennies per megabyte.  Bah humbug.






More information about the Bogofilter mailing list