Getting rid of plain obvious spam
Andreas Pardeike
andreas at pardeike.net
Wed Apr 7 11:35:22 CEST 2004
Hi,
I am using bogofilter with success but somehow I can't get it
to recognise plain spam even if it should be rather simple to
detect. What's so special about the message that is shown in
the attachment (reports includes token and histogram output too)
and what can I do to get better results on obvious buzzwords?
Simple training seems not to be enough (I run all incoming through
bogofilter -u and correct with -sN and -Sn accordingly).
Any help appreciated,
Andreas Pardeike
-------------- next part --------------
Received: from addesign.de (DDVWWR01.cpe.hoov.al.charter.com [68.191.109.6] (may be forged))
by localhost.localdomain (8.12.8/8.12.8) with ESMTP id i379N32Q020692
for <andreas at pardeike.net>; Wed, 7 Apr 2004 11:23:04 +0200
Message-ID: <GKKLDIOKFNJENNNBKODJPOGCLNAA.laura_michaeloz at blues.uab.es>
From: "Laura E. Michael" <laura_michaeloz at blues.uab.es>
To: andreas at pardeike.net
Subject: I need you here
Date: Thu, 08 Apr 2004 01:48:33 +0000
MIME-Version: 1.0
Content-Type: text/plain
X-Bogosity: No, spamicity 0.595582
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from base64 to 8bit by localhost.localdomain id i379N32Q020692
Buy Viagra and Cialas Aka "Super Viagra"..The Viagra that last all weekend!..
and other good prescriptions....Save big on this site check it out.. Next-Day Fedex ...
here at.. http://no4bb527wd.excellentrxmd.com?rid=1000
of (eg Command The website have of or two latter
bogofilter -vvv < viagra01.txt
X-Bogosity: No, tests=bogofilter, spamicity=0.987342, version=0.17.5
n pgood pbad fw U
"from:blues.uab.es" 1 0.001468 0.000000 0.011274 +
"from:laura_michaeloz" 1 0.001468 0.000000 0.011274 +
"head:plain" 480 0.478708 0.046526 0.088602 +
"from:Michael" 17 0.013216 0.002417 0.155118 -
"from:Laura" 2 0.001468 0.000302 0.174816 -
"head:localhost.localdomain" 115 0.061674 0.022054 0.263463 -
"subj:need" 39 0.019090 0.007855 0.291685 -
"head:from" 99 0.041116 0.021450 0.342894 -
"two" 126 0.044053 0.029003 0.397033 -
"head:text" 1526 0.531571 0.351662 0.398156 -
"head:bit" 945 0.325991 0.218429 0.401219 -
"have" 1019 0.314244 0.243202 0.436283 -
"head:Content-Transfer-Encoding" 1186 0.361233 0.283988 0.440143 -
"that" 1205 0.364170 0.289124 0.442566 -
"rcvd:Wed" 872 0.259912 0.209970 0.446861 -
"rcvd:andreas" 1292 0.384728 0.311178 0.447158 -
"this" 1315 0.390602 0.316918 0.447931 -
"rcvd:for" 2918 0.798825 0.717221 0.473087 -
"head:X-MIME-Autoconverted" 86 0.023495 0.021148 0.473751 -
"rcvd:localhost.localdomain" 2831 0.734214 0.704230 0.489578 -
"rcvd:ESMTP" 1277 0.330396 0.317825 0.490305 -
"rcvd:from" 3642 0.911894 0.912689 0.500218 -
"head:Date" 3955 0.986784 0.991843 0.501279 -
"rcvd:with" 3270 0.785609 0.826284 0.512618 -
"rcvd:pardeike.net" 1930 0.418502 0.496979 0.542862 -
"out" 616 0.132159 0.158912 0.545960 -
"rcvd:Apr" 2910 0.621145 0.751360 0.547437 -
"http" 3417 0.715125 0.885196 0.553137 -
"The" 1048 0.218796 0.271601 0.553841 -
"head:Message-ID" 2946 0.612335 0.764048 0.555114 -
"last" 147 0.029369 0.038369 0.566443 -
"and" 2191 0.428781 0.573716 0.572287 -
"check" 194 0.036711 0.051057 0.581736 -
"head:Content-Type" 3696 0.679883 0.976737 0.589597 -
"good" 159 0.026432 0.042598 0.617101 -
"head:MIME-Version" 3220 0.530103 0.863746 0.619684 -
"all" 923 0.151248 0.247734 0.620916 -
"other" 403 0.064611 0.108459 0.626679 -
"no4bb527wd.excellentrxmd.com" 0 0.000000 0.000000 0.644661 -
"rcvd:68.191.109.6" 0 0.000000 0.000000 0.644661 -
"rcvd:forged" 178 0.022026 0.049245 0.690944 -
"rcvd:may" 178 0.022026 0.049245 0.690944 -
"site" 330 0.035242 0.092447 0.723996 -
"here" 862 0.088106 0.242296 0.733336 -
"to:andreas" 1455 0.101322 0.418731 0.805169 -
"to:pardeike.net" 2432 0.155653 0.702719 0.818663 -
"subj:here" 29 0.001468 0.008459 0.851960 -
"website" 209 0.010279 0.061027 0.855829 -
"head:base64" 171 0.001468 0.051360 0.972169 +
"subj:you" 191 0.001468 0.057402 0.975026 +
"rcvd:addesign.de" 1 0.000000 0.000302 0.993786 +
"Command" 18 0.000000 0.005438 0.999649 +
"latter" 20 0.000000 0.006042 0.999684 +
"prescriptions....Save" 34 0.000000 0.010272 0.999814 +
"Aka" 35 0.000000 0.010574 0.999819 +
"Cialas" 35 0.000000 0.010574 0.999819 +
"Next-Day" 35 0.000000 0.010574 0.999819 +
"rid" 38 0.000000 0.011480 0.999834 +
"weekend!" 39 0.000000 0.011782 0.999838 +
"Fedex" 50 0.000000 0.015106 0.999874 +
"Super" 103 0.000000 0.031118 0.999939 +
"big" 109 0.000000 0.032931 0.999942 +
"Viagra" 110 0.000000 0.033233 0.999943 +
"Buy" 119 0.000000 0.035952 0.999947 +
N_P_Q_S_s_x_md 19 0.00e+00 9.75e-01 9.87e-01
1.78e-02 6.45e-01 0.375
bogofilter -vv < viagra01.txt
X-Bogosity: No, tests=bogofilter, spamicity=0.987342, version=0.17.5
int cnt prob spamicity histogram
0.00 3 0.037050 0.013254 ###
0.10 0 0.000000 0.013254
0.20 0 0.000000 0.013254
0.30 0 0.000000 0.013254
0.40 0 0.000000 0.013254
0.50 0 0.000000 0.013254
0.60 0 0.000000 0.013254
0.70 0 0.000000 0.013254
0.80 0 0.000000 0.013254
0.90 16 0.996181 0.688076 ################
bogofilter -Q
# bogofilter version 0.17.5
robx = 0.644661 # (6.45e-01)
robs = 0.017800 # (1.78e-02)
min_dev = 0.375000 # (3.75e-01)
ham_cutoff = 0.000000 # (0.00e+00)
spam_cutoff = 0.990000 # (9.90e-01)
block_on_subnets = no
charset_default = us-ascii
replace_nonascii_characters = no
stats_in_header = yes
thresh_update = 0.000000
timestamp = yes
terse = no
spam_header_name = X-Bogosity
spam_subject_tag = [SPAM]
unsure_subject_tag = [UNSURE]
header_format = %h: %c, tests=bogofilter, spamicity=%p, version=%v
terse_format = %1.1c %f
log_header_format = %h: %c, spamicity=%p, version=%v
log_update_format = register-%r, %w words, %m messages
spamicity_tags = Yes, No
spamicity_formats = %0.6f, %0.6f
More information about the bogofilter
mailing list