html comment processing

Tony L. Svanstrom tony at svanstrom.com
Sun Mar 30 15:08:47 CEST 2003


On Sat, 29 Mar 2003 the voices made David Relson write:

DR> The question at hand is "How should bogofilter define html comments?"

DR> Also of note, today there has been a discussion titled "It's getting
DR> worse", which is about spam with html comments lacking hyphens.  The
DR> "Please vis<!..>it our ..." sample is from a message in the
DR> discussion.  The attached patch fixes that problem as well.
DR>
DR> For the html purists, I propose to add a config file option named
DR> "strict_comment".  A value of "true" will cause bogofilter to follow the
DR> standard and a value of "false" will work as described above.  The default
DR> value will be "false".

 IMNSHO I think the purists are idiots if they'll rather follow specs than
what's going on in the real world; and if they are true purists then they'd
be complaining about any simple s/this.*?that//g-approach to removing comments
anyways.
 (True purists would of course reject any and all e-mails containing X/HTML
that doesn't validate. ;-))

 Spammers are sending relevant- and filler-data (like Bayesbusters in the form
of comments, SGML-declarations etc), and I think the filler-data should be
ignored as much as possible; that bogofilter focuses on headers and the
contents that is actually seen by the user.


-- 
      /\___/\                                              /\___/\
      \_@ @_/                                              \_@ @_/
 +--oOO-(_)-OOo------------------------------------------oOO-(_)-OOo--+
 | Per scientiam ad libertatem! // Through knowledge towards freedom! |
 +---ôôô---ôôô--------------------------------------------ôôô---ôôô---+
     \O/   \O/      (c)1998-2003  tony at svanstrom.com      \O/   \O/



Return-Path: <>
Delivered-To: relson at osagesoftware.com
Received: by osagesoftware.com (Postfix) via BOUNCE
	id 9EB9127ED2; Sun, 30 Mar 2003 08:15:02 -0500 (EST)
Date: Sun, 30 Mar 2003 08:15:02 -0500 (EST)
From: MAILER-DAEMON at osagesoftware.com (Mail Delivery System)
Subject: Undelivered Mail Returned to Sender
To: relson at osagesoftware.com
MIME-Version: 1.0
Content-Type: multipart/report; report-type=delivery-status;
	boundary="9571F27ECE.1049030102/osagesoftware.com"
Message-Id: <20030330131502.9EB9127ED2 at osagesoftware.com>

This is a MIME-encapsulated message.

--9571F27ECE.1049030102/osagesoftware.com
Content-Description: Notification
Content-Type: text/plain

This is the Postfix program at host osagesoftware.com.

I'm sorry to have to inform you that the message returned
below could not be delivered to one or more destinations.

For further assistance, please send mail to <postmaster>

If you do so, please include this problem report. You can
delete your own text from the message returned below.

			The Postfix program

<admin at nic.osagesoftware.com>: mail for nic.osagesoftware.com loops back to
    myself

--9571F27ECE.1049030102/osagesoftware.com
Content-Description: Delivery error report
Content-Type: message/delivery-status

Reporting-MTA: dns; osagesoftware.com
Arrival-Date: Sun, 30 Mar 2003 08:15:01 -0500 (EST)

Final-Recipient: rfc822; admin at nic.osagesoftware.com
Action: failed
Status: 5.0.0
Diagnostic-Code: X-Postfix; mail for nic.osagesoftware.com loops back to myself

--9571F27ECE.1049030102/osagesoftware.com
Content-Description: Undelivered Message
Content-Type: message/rfc822

Received: from osage.osagesoftware.com (osage.osagesoftware.com [192.168.1.10])
	by osagesoftware.com (Postfix) with ESMTP id 9571F27ECE
	for <admin at nic.osagesoftware.com>; Sun, 30 Mar 2003 08:15:01 -0500 (EST)
Received: by osage.osagesoftware.com (Postfix, from userid 1000)
	id 1C5C614494; Sun, 30 Mar 2003 08:15:01 -0500 (EST)
From: root at osagesoftware.com (Cron Daemon)
To: admin at nic.osagesoftware.com
Subject: Cron <relson at osage> /home/relson/bin/mail.test
X-Cron-Env: <SHELL=/bin/bash>
X-Cron-Env: <PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/sbin>
X-Cron-Env: <MAILTO=admin at nic.osagesoftware.com>
X-Cron-Env: <MAIL_USER=admin at nic.osagesoftware.com>
X-Cron-Env: <HOME=/>
X-Cron-Env: <LOGNAME=relson>
Message-Id: <20030330131501.1C5C614494 at osage.osagesoftware.com>
Date: Sun, 30 Mar 2003 08:15:01 -0500 (EST)

/bin/bash: line 1: /home/relson/bin/mail.test: No such file or directory

--9571F27ECE.1049030102/osagesoftware.com--



More information about the Bogofilter mailing list