scaled counts [was: troublesome false negative]

Tue Nov 5 02:17:51 CET 2002

At 07:00 PM 11/4/02, Greg Louis wrote:

> >>> FWIW, the calculated .ROBX for my wordlist is approx 0.19.
>
> >> This should be calculated with scaled counts when the wordlist sizes
> >> differ.
>
> > And "scaled counts" means what?
>
>Same as it means when bogofilter is evaluating:
>  cb = count[spamlist:token]
>  cg = count[goodlist:token] * msgcount:spam / msgcount:good
>  f(w) = (s * x + cb) / (s + cg + cb)

I don't understand the initial reference to scaled counts.  Was it simply 
that I failed to mention scaling when I described how robx is 
calculated?  Or is something more significant happening?

bogoutil's "-x" option implements the same algorithm as the robx.pl perl 
script, i.e. adding probabilities (with appropriate scaling as shown in the 
above equations).

...[snip]...

> > >Wanna send me the output of the same command as seen with your word
> > >lists?  That might be interesting...
>
>I still think this would be interesting............

Do you want my probabilities for all the tokens in the message?   I'll be 
glad to send that info.