speaking of random words

Tom Anderson tanderso at oac-design.com
Wed Mar 17 15:50:35 CET 2004


Check out the attached email... I sent it to bfproxy as a spam and it
was registered 10 times without changing the classification from
0.000000.  This is after those 10 registrations:

X-Bogosity: No, tests=bogofilter, spamicity=0.000000, version=0.16.0
   int  cnt   prob  spamicity histogram
  0.00   94 0.054958 0.025289 ####################################
  0.10   93 0.151070 0.068970 ###################################
  0.20  128 0.248740 0.145299 #########################################7
  0.30    0 0.000000 0.145299 
  0.40    0 0.000000 0.145299 
  0.50    0 0.000000 0.145299 
  0.60    0 0.000000 0.145299 
  0.70   38 0.739668 0.230877 ###############
  0.80   12 0.833886 0.259396 #####
  0.90   19 0.981330 0.349559 ########

Apparently, the long list of words at the end were sufficiently common
enough that they matched my hams very closely.  Granted, some of them
showed up spammy, but not nearly enough.

My email client is smart enough to be able to not display images, but
average users will not only see the full marketing message, but also
send back a confirmation when the images load.  Therefore this technique
seems to be successful.

Another 10 registrations, and it's starting to get there:

X-Bogosity: Unsure, tests=bogofilter, spamicity=0.329026, version=0.16.0
   int  cnt   prob  spamicity histogram
  0.00   75 0.053009 0.022661 ###############################
  0.10   96 0.155688 0.072316 #######################################
  0.20  119 0.251772 0.145532 #########################################7
  0.30    0 0.000000 0.145532 
  0.40    0 0.000000 0.145532 
  0.50    0 0.000000 0.145532 
  0.60    0 0.000000 0.145532 
  0.70   41 0.741739 0.240782 #################
  0.80   16 0.834272 0.278161 #######
  0.90   41 0.993195 0.448945 #################

It seems as though the only defense is to register the email many times
such that the very hammy words become more neutral and the somewhat
spammy ones become very spammy.  This is a strong case for exhaustive
training.  But of course it comes with the risk that some tokens will
become spammy enough to push hams in the wrong direction.  So far that
hasn't happened, but I'll keep my eyes peeled.

Tom

-------------- next part --------------
A non-text attachment was scrubbed...
Name: random.eml.tar
Type: application/x-tar
Size: 20480 bytes
Desc: not available
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040317/57bf90c5/attachment.tar>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040317/57bf90c5/attachment.sig>


More information about the Bogofilter mailing list