"url:" counts

Matt Garretson mattg at assembly.state.ny.us
Thu Jan 8 20:18:19 CET 2004


David Relson wrote:
> Prompted by Matt's comment on the misnaming of "url:" tokens, I counted
> what's in my database and how many have very low or very high scores. 


FWIW, here are my values:

(note that my corpus' ham/spam ratio is about 1/2 )

   count    score

   9,734 <  0.01
  67,798	>= 0.99

     936 <  0.001
   4,049 >= 0.999

  79,551 "url:" tokens
(61,332 of these are singletons)

829,442 total tokens



> Here are some numbers:
> 
>     count    score
>    29,160 <  0.01
>    62,052 >= 0.99
> 
>     4,506 <  0.001
>     2,171 >= 0.999
> 
>    96,917 "url:" tokens
> 1,178,243 total tokens
> 
> ---------------------------------------------------------------------
> FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
> To unsubscribe, e-mail: bogofilter-unsubscribe at aotto.com
> For summary digest subscription: bogofilter-digest-subscribe at aotto.com
> For more commands, e-mail: bogofilter-help at aotto.com
> 





More information about the Bogofilter mailing list