Getting rid of plain obvious spam

Andreas Pardeike andreas at pardeike.net
Wed Apr 7 11:35:22 CEST 2004


Hi,

I am using bogofilter with success but somehow I can't get it
to recognise plain spam even if it should be rather simple to
detect. What's so special about the message that is shown in
the attachment (reports includes token and histogram output too)
and what can I do to get better results on obvious buzzwords?

Simple training seems not to be enough (I run all incoming through
bogofilter -u and correct with -sN and -Sn accordingly).

Any help appreciated,
Andreas Pardeike

-------------- next part --------------
Received: from addesign.de (DDVWWR01.cpe.hoov.al.charter.com [68.191.109.6] (may be forged))
	by localhost.localdomain (8.12.8/8.12.8) with ESMTP id i379N32Q020692
	for <andreas at pardeike.net>; Wed, 7 Apr 2004 11:23:04 +0200
Message-ID: <GKKLDIOKFNJENNNBKODJPOGCLNAA.laura_michaeloz at blues.uab.es>
From: "Laura E. Michael" <laura_michaeloz at blues.uab.es>
To: andreas at pardeike.net
Subject: I need you here
Date: Thu, 08 Apr 2004 01:48:33 +0000
MIME-Version: 1.0
Content-Type: text/plain
X-Bogosity: No, spamicity 0.595582
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from base64 to 8bit by localhost.localdomain id i379N32Q020692



Buy Viagra and Cialas Aka "Super Viagra"..The Viagra that last all weekend!.. 
and other good prescriptions....Save big on this site check it out.. Next-Day Fedex ... 
here at.. http://no4bb527wd.excellentrxmd.com?rid=1000



of (eg Command The website have of or two latter 




bogofilter -vvv < viagra01.txt 
X-Bogosity: No, tests=bogofilter, spamicity=0.987342, version=0.17.5
                                     n    pgood     pbad      fw     U
"from:blues.uab.es"                  1  0.001468  0.000000  0.011274 +
"from:laura_michaeloz"               1  0.001468  0.000000  0.011274 +
"head:plain"                       480  0.478708  0.046526  0.088602 +
"from:Michael"                      17  0.013216  0.002417  0.155118 -
"from:Laura"                         2  0.001468  0.000302  0.174816 -
"head:localhost.localdomain"       115  0.061674  0.022054  0.263463 -
"subj:need"                         39  0.019090  0.007855  0.291685 -
"head:from"                         99  0.041116  0.021450  0.342894 -
"two"                              126  0.044053  0.029003  0.397033 -
"head:text"                       1526  0.531571  0.351662  0.398156 -
"head:bit"                         945  0.325991  0.218429  0.401219 -
"have"                            1019  0.314244  0.243202  0.436283 -
"head:Content-Transfer-Encoding"   1186  0.361233  0.283988  0.440143 -
"that"                            1205  0.364170  0.289124  0.442566 -
"rcvd:Wed"                         872  0.259912  0.209970  0.446861 -
"rcvd:andreas"                    1292  0.384728  0.311178  0.447158 -
"this"                            1315  0.390602  0.316918  0.447931 -
"rcvd:for"                        2918  0.798825  0.717221  0.473087 -
"head:X-MIME-Autoconverted"         86  0.023495  0.021148  0.473751 -
"rcvd:localhost.localdomain"      2831  0.734214  0.704230  0.489578 -
"rcvd:ESMTP"                      1277  0.330396  0.317825  0.490305 -
"rcvd:from"                       3642  0.911894  0.912689  0.500218 -
"head:Date"                       3955  0.986784  0.991843  0.501279 -
"rcvd:with"                       3270  0.785609  0.826284  0.512618 -
"rcvd:pardeike.net"               1930  0.418502  0.496979  0.542862 -
"out"                              616  0.132159  0.158912  0.545960 -
"rcvd:Apr"                        2910  0.621145  0.751360  0.547437 -
"http"                            3417  0.715125  0.885196  0.553137 -
"The"                             1048  0.218796  0.271601  0.553841 -
"head:Message-ID"                 2946  0.612335  0.764048  0.555114 -
"last"                             147  0.029369  0.038369  0.566443 -
"and"                             2191  0.428781  0.573716  0.572287 -
"check"                            194  0.036711  0.051057  0.581736 -
"head:Content-Type"               3696  0.679883  0.976737  0.589597 -
"good"                             159  0.026432  0.042598  0.617101 -
"head:MIME-Version"               3220  0.530103  0.863746  0.619684 -
"all"                              923  0.151248  0.247734  0.620916 -
"other"                            403  0.064611  0.108459  0.626679 -
"no4bb527wd.excellentrxmd.com"       0  0.000000  0.000000  0.644661 -
"rcvd:68.191.109.6"                  0  0.000000  0.000000  0.644661 -
"rcvd:forged"                      178  0.022026  0.049245  0.690944 -
"rcvd:may"                         178  0.022026  0.049245  0.690944 -
"site"                             330  0.035242  0.092447  0.723996 -
"here"                             862  0.088106  0.242296  0.733336 -
"to:andreas"                      1455  0.101322  0.418731  0.805169 -
"to:pardeike.net"                 2432  0.155653  0.702719  0.818663 -
"subj:here"                         29  0.001468  0.008459  0.851960 -
"website"                          209  0.010279  0.061027  0.855829 -
"head:base64"                      171  0.001468  0.051360  0.972169 +
"subj:you"                         191  0.001468  0.057402  0.975026 +
"rcvd:addesign.de"                   1  0.000000  0.000302  0.993786 +
"Command"                           18  0.000000  0.005438  0.999649 +
"latter"                            20  0.000000  0.006042  0.999684 +
"prescriptions....Save"             34  0.000000  0.010272  0.999814 +
"Aka"                               35  0.000000  0.010574  0.999819 +
"Cialas"                            35  0.000000  0.010574  0.999819 +
"Next-Day"                          35  0.000000  0.010574  0.999819 +
"rid"                               38  0.000000  0.011480  0.999834 +
"weekend!"                          39  0.000000  0.011782  0.999838 +
"Fedex"                             50  0.000000  0.015106  0.999874 +
"Super"                            103  0.000000  0.031118  0.999939 +
"big"                              109  0.000000  0.032931  0.999942 +
"Viagra"                           110  0.000000  0.033233  0.999943 +
"Buy"                              119  0.000000  0.035952  0.999947 +
N_P_Q_S_s_x_md                      19  0.00e+00  9.75e-01  9.87e-01
                                        1.78e-02  6.45e-01  0.375

bogofilter -vv < viagra01.txt
X-Bogosity: No, tests=bogofilter, spamicity=0.987342, version=0.17.5
   int  cnt   prob  spamicity histogram
  0.00    3 0.037050 0.013254 ###
  0.10    0 0.000000 0.013254 
  0.20    0 0.000000 0.013254 
  0.30    0 0.000000 0.013254 
  0.40    0 0.000000 0.013254 
  0.50    0 0.000000 0.013254 
  0.60    0 0.000000 0.013254 
  0.70    0 0.000000 0.013254 
  0.80    0 0.000000 0.013254 
  0.90   16 0.996181 0.688076 ################


bogofilter -Q
# bogofilter version 0.17.5

robx        = 0.644661  # (6.45e-01)
robs        = 0.017800  # (1.78e-02)
min_dev     = 0.375000  # (3.75e-01)
ham_cutoff  = 0.000000  # (0.00e+00)
spam_cutoff = 0.990000  # (9.90e-01)

block_on_subnets  = no
charset_default   = us-ascii
replace_nonascii_characters = no
stats_in_header   = yes
thresh_update     = 0.000000
timestamp         = yes

terse             = no
spam_header_name  = X-Bogosity
spam_subject_tag  = [SPAM]
unsure_subject_tag = [UNSURE]
header_format     = %h: %c, tests=bogofilter, spamicity=%p, version=%v
terse_format      = %1.1c %f
log_header_format = %h: %c, spamicity=%p, version=%v
log_update_format = register-%r, %w words, %m messages
spamicity_tags    = Yes, No
spamicity_formats = %0.6f, %0.6f


More information about the bogofilter mailing list