base64 spam

David Relson relson at osagesoftware.com
Wed Nov 20 02:11:55 CET 2002


Greetings,

My son is the lucky recipient of OsageSoftware's first spam message totally 
encoded in base64.

Here's the header:

	From wwcap at aol.com  Tue Nov 19 19:22:10 2002
	Return-Path: <wwcap at aol.com>
	Delivered-To: eric at osagesoftware.com
	Received: from idjke (unknown [200.174.69.242])
		by osagesoftware.com (Postfix) with SMTP id 47020280E2
		for <eric at osagesoftware.com>; Tue, 19 Nov 2002 19:22:03 -0500 (EST)
	From: Ernestine Bulman <wwcap at aol.com>
	To: <eric at osagesoftware.com>
	Subject: eric, Straight from the pharmacy to you! Gen*ric V*agra $5.00
	Date: Tue, 19 Nov 2002 17:19:05 -0800
	Mime-Version: 1.0
	Content-Type: text/html
	Content-Transfer-Encoding: base64
	Message-Id: <qwrjjwkhr at aol.com>

The body is simply base64, i.e. 193 lines of 72 characters.

FWIW, here's bogofilter's output (mostly the histogram):

	X-Bogosity: No, tests=bogofilter, spamicity=0.422789, 
version=0.8.0.cvs.20021114
	  int  cnt    prob   spamicity  histogram
		 0.00    2  0.001197  0.000208  ##
		 0.10    1  0.129550  0.010871  #
		 0.20    2  0.264822  0.049365  ##
		 0.30    3  0.356403  0.111376  ###
		 0.40   14  0.429331  0.287388  ##############
		 0.50    5  0.553832  0.338210  #####
		 0.60    4  0.635512  0.377843  ####
		 0.70    1  0.751105  0.390086  #
		 0.80    1  0.818300  0.404171  #
		 0.90    1  0.914743  0.422789  #

Running the message through bogolexer, i.e. "cat message | bogolexer -p | 
sort -u | wc", gives "35".  This is the count of tokens identified by 
bogofilter's lexer component, i.e. the number of tokens available for 
bogofilter to use in classifying th message.  Since the lexer also throws 
away excessively long tokens, the result is to classify based on the header 
(and to ignore the 193 lines of base64 that comprise the body).

Indeed, if spammers are going to be using creative spelling in their 
headers, e.g. "Gen*ric V*aga", and encoding their whole message in a block 
of base64 text, we _do_ need to deal with it.  Sigh :-(

David





More information about the Bogofilter mailing list