[Fwd: Re: Dealing with wordlist mails]

David Relson relson at osagesoftware.com
Wed Jan 28 13:48:37 CET 2004


Greetings Manvendra,

FWIW, here are the histograms for the two messages (using my wordlist
and bogofilter's default parameters):

bogofilter -vv < mail1.txt
X-Bogosity: No, tests=bogofilter, spamicity=0.500000, version=0.16.4
   int  cnt   prob  spamicity histogram
  0.00  104 0.024192 0.007692
################################################
  0.10   42 0.149538 0.025230 ####################
  0.20   43 0.255139 0.054248 ####################
  0.30   49 0.351806 0.096347 #######################
  0.40    0 0.000000 0.096347 
  0.50    0 0.000000 0.096347 
  0.60   69 0.651821 0.207767 ################################
  0.70   70 0.752134 0.303981 #################################
  0.80   56 0.852745 0.372600 ##########################
  0.90   81 0.967166 0.485076 ######################################

bogofilter -vv < mail2.txt
X-Bogosity: Yes, tests=bogofilter, spamicity=1.000000, version=0.16.4
   int  cnt   prob  spamicity histogram
  0.00   13 0.040374 0.010020 #####
  0.10   10 0.150322 0.030473 ####
  0.20   15 0.257446 0.072353 #####
  0.30   23 0.348196 0.137057 ########
  0.40    0 0.000000 0.137057 
  0.50    0 0.000000 0.137057 
  0.60   24 0.649705 0.265520 ########
  0.70   30 0.748952 0.391193 ##########
  0.80   33 0.867653 0.504804 ###########
  0.90  152 0.987423 0.713459
################################################

The histograms don't "prove" anything, but they _do_ show that not all
wordlists will score both these messages as 0.5

Cheers!

David




More information about the Bogofilter mailing list