using fisher without robinson
David Relson
relson at osagesoftware.com
Wed Jan 1 01:57:23 CET 2003
At 07:52 PM 12/31/02, Greg Louis wrote:
>On 20021231 (Tue) at 1328:38 -0500, David Relson wrote:
> > At 01:09 PM 12/31/02, Graham Wilson wrote:
> >
> > >is it possible to compile bogofilter with only the fisher code, i.e.
> > >without the fisher or graham code?
> >
> > Graham,
> >
> > Fisher without graham, yes. Fisher without robinson.c, no. In object
> > oriented terms, fisher is a subclass of robinson. In practical terms,
> > fisher is built upon the robinson method. It takes the robinson result
> and
> > applies a chi-square test to determine the likelihood that the robinson
> > result is spam (given the number of tokens used in getting the robinson
> > result). Leastways, as a non-statistician that's how I think it happens.
> >
>Shouldn't be so, though. There is no logical reason (developer
>convenience excluded) why compiling for fisher alone should be
>impossible.
>
>There are two processes involved in calculating a spamicity value
>(spamval henceforth). The first is to come up with some sort of
>probability estimate for each unique token in the message that meets
>inclusion criteria. This is Paul's p(w) and Gary's f(w). The second
>is to combine these values to yield an overall spamval. Robinson-gm
>does this by calculating the geometric means and combining those.
>Robinson-Fisher does it by applying Fisher's method for combining
>probabilities to P(spam) and P(nonspam) and combining those. Just
>because both Robinson methods start by calculating Robinson's f(w)
>values is no reason to force Fisher users to compile in the
>geometric-mean baggage. Not only Fisher is a subclass of Robinson;
>Fisher and GM should both be subclasses, individually dispensible at
>compile time, of Robinson.
>
>Happy New Year all the same....... :)
Premature optimization. Memory is cheap - pennies per megabyte. Bah humbug.
More information about the Bogofilter
mailing list