"url:" counts

David Relson relson at osagesoftware.com
Thu Jan 8 20:28:29 CET 2004


On Thu, 08 Jan 2004 14:18:19 -0500
Matt Garretson <mattg at assembly.state.ny.us> wrote:

> David Relson wrote:
> > Prompted by Matt's comment on the misnaming of "url:" tokens, I
> > counted what's in my database and how many have very low or very
> > high scores. 
> 
> 
> FWIW, here are my values:
> 
> (note that my corpus' ham/spam ratio is about 1/2 )
> 
>    count    score
> 
>    9,734 <  0.01
>   67,798	>= 0.99
> 
>      936 <  0.001
>    4,049 >= 0.999
> 
>   79,551 "url:" tokens
> (61,332 of these are singletons)
> 
> 829,442 total tokens

Lots of strong indicators, eh?




More information about the Bogofilter mailing list