How to avoid s p lit up wor ds?
David Relson
relson at osagesoftware.com
Fri Jan 17 22:06:24 CET 2003
At 03:45 PM 1/17/03, Barry Gould wrote:
>At 12:29 PM 1/17/2003, David Relson wrote:
>>Do you remember what the spam was using to split up the words? I'll do
>>some experiments using bogolexer, but it'd be helpful to know what the
>>original looked like.
>
>Hi David, I have a spam from yesterday which used lots of HTML comments to
>break up words. Using Bogofilter 0.8.0, it got a score of 0.000000
>
>It's attached as GZIP'd text.
>
>I've think I've also seen others which use lots of font tags to obscure
>the plaintext.
>
>Thanks,
>Barry
Barry,
Thanks for the sample. It's pretty nasty looking. My program/wordlists
gave it a score of 0.304952. The bad news is that the html tags _do_ break
up the words. We'll have to take a look at that ...
David
Return-Path: <>
Delivered-To: relson at osagesoftware.com
Received: from inetsrv01.digitalrealm.net (inetsrv01.digitalrealm.net [216.144.192.101])
by osagesoftware.com (Postfix) with ESMTP id 5F7B62838C
for <relson at osagesoftware.com>; Fri, 17 Jan 2003 16:07:33 -0500 (EST)
Received: from localhost (localhost)
by inetsrv01.digitalrealm.net (8.11.6/linuxconf) id h0HL7S402840;
Fri, 17 Jan 2003 16:07:28 -0500
Date: Fri, 17 Jan 2003 16:07:28 -0500
From: Mail Delivery Subsystem <MAILER-DAEMON at inetsrv01.digitalrealm.net>
Message-Id: <200301172107.h0HL7S402840 at inetsrv01.digitalrealm.net>
To: <relson at osagesoftware.com>
MIME-Version: 1.0
Content-Type: multipart/report; report-type=delivery-status;
boundary="h0HL7S402840.1042837648/inetsrv01.digitalrealm.net"
Subject: Returned mail: see transcript for details
Auto-Submitted: auto-generated (failure)
X-Bogosity: No, tests=bogofilter, spamicity=0.000000, version=0.10.2.tst.20030210
int cnt prob spamicity histogram
0.00 25 0.016069 0.006832 #########################
0.10 13 0.154904 0.040149 #############
0.20 7 0.236315 0.066065 #######
0.30 7 0.334311 0.101251 #######
0.40 0 0.000000 0.101251
0.50 0 0.000000 0.101251
0.60 8 0.651526 0.184771 ########
0.70 1 0.706763 0.194809 #
0.80 2 0.841598 0.222570 ##
0.90 0 0.000000 0.000000
This is a MIME-encapsulated message
--h0HL7S402840.1042837648/inetsrv01.digitalrealm.net
The original message was received at Fri, 17 Jan 2003 16:07:24 -0500
from osagesoftware.com [216.144.204.42]
----- The following addresses had permanent fatal errors -----
<relson at digitalrealm.net>
----- Transcript of session follows -----
550 5.1.1 <relson at digitalrealm.net>... User unknown
--h0HL7S402840.1042837648/inetsrv01.digitalrealm.net
Content-Type: message/delivery-status
Reporting-MTA: dns; inetsrv01.digitalrealm.net
Arrival-Date: Fri, 17 Jan 2003 16:07:24 -0500
Final-Recipient: RFC822; relson at digitalrealm.net
Action: failed
Status: 5.1.1
Last-Attempt-Date: Fri, 17 Jan 2003 16:07:28 -0500
--h0HL7S402840.1042837648/inetsrv01.digitalrealm.net
Content-Type: message/rfc822
Return-Path: <relson at osagesoftware.com>
Received: from osagesoftware.com (osagesoftware.com [216.144.204.42])
by inetsrv01.digitalrealm.net (8.11.6/linuxconf) with ESMTP id h0HL7Od02837
for <relson at digitalrealm.net>; Fri, 17 Jan 2003 16:07:24 -0500
Received: from maple.osagesoftware.com (maple.osagesoftware.com [192.168.1.20])
by osagesoftware.com (Postfix) with ESMTP id BC50D2838C
for <relson at digitalrealm.net>; Fri, 17 Jan 2003 16:07:27 -0500 (EST)
Message-Id: <4.3.2.7.2.20030117160711.00dbeb40 at mail.osagesoftware.com>
X-Sender: relson at mail.osagesoftware.com
X-Mailer: QUALCOMM Windows Eudora Version 4.3.2
Date: Fri, 17 Jan 2003 16:07:25 -0500
To: relson at digitalrealm.net
From: David Relson <relson at osagesoftware.com>
Subject: test 01/17 16:07
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
--------------------------------------------------------
David Relson Osage Software Systems, Inc.
relson at osagesoftware.com Ann Arbor, MI 48103
www.osagesoftware.com tel: 734.821.8800
--h0HL7S402840.1042837648/inetsrv01.digitalrealm.net--
More information about the Bogofilter
mailing list