[Fwd: Re: Dealing with wordlist mails]
David Relson
relson at osagesoftware.com
Wed Jan 28 13:48:37 CET 2004
Greetings Manvendra,
FWIW, here are the histograms for the two messages (using my wordlist
and bogofilter's default parameters):
bogofilter -vv < mail1.txt
X-Bogosity: No, tests=bogofilter, spamicity=0.500000, version=0.16.4
int cnt prob spamicity histogram
0.00 104 0.024192 0.007692
################################################
0.10 42 0.149538 0.025230 ####################
0.20 43 0.255139 0.054248 ####################
0.30 49 0.351806 0.096347 #######################
0.40 0 0.000000 0.096347
0.50 0 0.000000 0.096347
0.60 69 0.651821 0.207767 ################################
0.70 70 0.752134 0.303981 #################################
0.80 56 0.852745 0.372600 ##########################
0.90 81 0.967166 0.485076 ######################################
bogofilter -vv < mail2.txt
X-Bogosity: Yes, tests=bogofilter, spamicity=1.000000, version=0.16.4
int cnt prob spamicity histogram
0.00 13 0.040374 0.010020 #####
0.10 10 0.150322 0.030473 ####
0.20 15 0.257446 0.072353 #####
0.30 23 0.348196 0.137057 ########
0.40 0 0.000000 0.137057
0.50 0 0.000000 0.137057
0.60 24 0.649705 0.265520 ########
0.70 30 0.748952 0.391193 ##########
0.80 33 0.867653 0.504804 ###########
0.90 152 0.987423 0.713459
################################################
The histograms don't "prove" anything, but they _do_ show that not all
wordlists will score both these messages as 0.5
Cheers!
David
More information about the Bogofilter
mailing list