scaled counts [was: troublesome false negative]

David Relson relson at osagesoftware.com
Tue Nov 5 02:17:51 CET 2002


At 07:00 PM 11/4/02, Greg Louis wrote:

> >>> FWIW, the calculated .ROBX for my wordlist is approx 0.19.
>
> >> This should be calculated with scaled counts when the wordlist sizes
> >> differ.
>
> > And "scaled counts" means what?
>
>Same as it means when bogofilter is evaluating:
>  cb = count[spamlist:token]
>  cg = count[goodlist:token] * msgcount:spam / msgcount:good
>  f(w) = (s * x + cb) / (s + cg + cb)

I don't understand the initial reference to scaled counts.  Was it simply 
that I failed to mention scaling when I described how robx is 
calculated?  Or is something more significant happening?

bogoutil's "-x" option implements the same algorithm as the robx.pl perl 
script, i.e. adding probabilities (with appropriate scaling as shown in the 
above equations).

...[snip]...

> > >Wanna send me the output of the same command as seen with your word
> > >lists?  That might be interesting...
>
>I still think this would be interesting............

Do you want my probabilities for all the tokens in the message?   I'll be 
glad to send that info.






More information about the Bogofilter mailing list