"url:" counts
Matt Garretson
mattg at assembly.state.ny.us
Thu Jan 8 20:18:19 CET 2004
David Relson wrote:
> Prompted by Matt's comment on the misnaming of "url:" tokens, I counted
> what's in my database and how many have very low or very high scores.
FWIW, here are my values:
(note that my corpus' ham/spam ratio is about 1/2 )
count score
9,734 < 0.01
67,798 >= 0.99
936 < 0.001
4,049 >= 0.999
79,551 "url:" tokens
(61,332 of these are singletons)
829,442 total tokens
> Here are some numbers:
>
> count score
> 29,160 < 0.01
> 62,052 >= 0.99
>
> 4,506 < 0.001
> 2,171 >= 0.999
>
> 96,917 "url:" tokens
> 1,178,243 total tokens
>
> ---------------------------------------------------------------------
> FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
> To unsubscribe, e-mail: bogofilter-unsubscribe at aotto.com
> For summary digest subscription: bogofilter-digest-subscribe at aotto.com
> For more commands, e-mail: bogofilter-help at aotto.com
>
More information about the Bogofilter
mailing list