base64 spam
David Relson
relson at osagesoftware.com
Wed Nov 20 02:11:55 CET 2002
Greetings,
My son is the lucky recipient of OsageSoftware's first spam message totally
encoded in base64.
Here's the header:
From wwcap at aol.com Tue Nov 19 19:22:10 2002
Return-Path: <wwcap at aol.com>
Delivered-To: eric at osagesoftware.com
Received: from idjke (unknown [200.174.69.242])
by osagesoftware.com (Postfix) with SMTP id 47020280E2
for <eric at osagesoftware.com>; Tue, 19 Nov 2002 19:22:03 -0500 (EST)
From: Ernestine Bulman <wwcap at aol.com>
To: <eric at osagesoftware.com>
Subject: eric, Straight from the pharmacy to you! Gen*ric V*agra $5.00
Date: Tue, 19 Nov 2002 17:19:05 -0800
Mime-Version: 1.0
Content-Type: text/html
Content-Transfer-Encoding: base64
Message-Id: <qwrjjwkhr at aol.com>
The body is simply base64, i.e. 193 lines of 72 characters.
FWIW, here's bogofilter's output (mostly the histogram):
X-Bogosity: No, tests=bogofilter, spamicity=0.422789,
version=0.8.0.cvs.20021114
int cnt prob spamicity histogram
0.00 2 0.001197 0.000208 ##
0.10 1 0.129550 0.010871 #
0.20 2 0.264822 0.049365 ##
0.30 3 0.356403 0.111376 ###
0.40 14 0.429331 0.287388 ##############
0.50 5 0.553832 0.338210 #####
0.60 4 0.635512 0.377843 ####
0.70 1 0.751105 0.390086 #
0.80 1 0.818300 0.404171 #
0.90 1 0.914743 0.422789 #
Running the message through bogolexer, i.e. "cat message | bogolexer -p |
sort -u | wc", gives "35". This is the count of tokens identified by
bogofilter's lexer component, i.e. the number of tokens available for
bogofilter to use in classifying th message. Since the lexer also throws
away excessively long tokens, the result is to classify based on the header
(and to ignore the 193 lines of base64 that comprise the body).
Indeed, if spammers are going to be using creative spelling in their
headers, e.g. "Gen*ric V*aga", and encoding their whole message in a block
of base64 text, we _do_ need to deal with it. Sigh :-(
David
More information about the Bogofilter
mailing list