mass processing with mutt and Fcc

David Relson relson at osagesoftware.com
Tue Apr 1 15:23:34 CEST 2003


At 08:01 AM 4/1/03, Boris 'pi' Piwinger wrote:

>David Relson wrote:
>
> > Bogofilter looks at nearly all the tokens of a message.  Some stuff is
> > ignored - for example message IDs, because they tend to be unique, and
>
>All of them? The local part could be interesting.
>
> > innards of html tags
>
>What exactly?
>
>pi

At the present time, when processing html, bogofilter does discards html 
comments, valid html tags (and their innards), and invalid html tags (and 
their innards).  Basically everything between angle brackets is being 
ignored at this time.

The rationale is that that many tokens within html tags are not worth 
scoring as spam indicators.  The html keywords themselves are very common, 
hence have little diagnostic value.  Stuff like colors (black, white, etc) 
is also common, while other colors (3D3D3D, 11FFAF, etc) are hex values and 
have too many possible values to be useful.  Html comments can include any 
kind of random garbage.

Plans include options so a user can specify whether bogofilter uses any (or 
all) of these tokens for scoring.

I can't say when this will be implemented.  The place for the additional 
code is in the lexer and I'm not good at modifying the grammar.  A 
volunteer would be very helpful!

David



Return-Path: <>
Delivered-To: relson at osagesoftware.com
Received: by osagesoftware.com (Postfix) via BOUNCE
	id 3071327ED2; Tue,  1 Apr 2003 08:26:01 -0500 (EST)
Date: Tue,  1 Apr 2003 08:26:01 -0500 (EST)
From: MAILER-DAEMON at osagesoftware.com (Mail Delivery System)
Subject: Undelivered Mail Returned to Sender
To: relson at osagesoftware.com
MIME-Version: 1.0
Content-Type: multipart/report; report-type=delivery-status;
	boundary="B2CC727ECE.1049203561/osagesoftware.com"
Message-Id: <20030401132601.3071327ED2 at osagesoftware.com>

This is a MIME-encapsulated message.

--B2CC727ECE.1049203561/osagesoftware.com
Content-Description: Notification
Content-Type: text/plain

This is the Postfix program at host osagesoftware.com.

I'm sorry to have to inform you that the message returned
below could not be delivered to one or more destinations.

For further assistance, please send mail to <postmaster>

If you do so, please include this problem report. You can
delete your own text from the message returned below.

			The Postfix program

<relson at nic.osagesoftware.com>: mail for nic.osagesoftware.com loops back to
    myself

--B2CC727ECE.1049203561/osagesoftware.com
Content-Description: Delivery error report
Content-Type: message/delivery-status

Reporting-MTA: dns; osagesoftware.com
Arrival-Date: Tue,  1 Apr 2003 08:26:00 -0500 (EST)

Final-Recipient: rfc822; relson at nic.osagesoftware.com
Action: failed
Status: 5.0.0
Diagnostic-Code: X-Postfix; mail for nic.osagesoftware.com loops back to myself

--B2CC727ECE.1049203561/osagesoftware.com
Content-Description: Undelivered Message
Content-Type: message/rfc822

Received: from osage.osagesoftware.com (osage.osagesoftware.com [192.168.1.10])
	by osagesoftware.com (Postfix) with ESMTP id B2CC727ECE
	for <relson at nic.osagesoftware.com>; Tue,  1 Apr 2003 08:26:00 -0500 (EST)
Received: by osage.osagesoftware.com (Postfix, from userid 1000)
	id 3C7A814494; Tue,  1 Apr 2003 08:26:00 -0500 (EST)
To: relson at mail.osagesoftware.com
Subject: test
Message-Id: <20030401132600.3C7A814494 at osage.osagesoftware.com>
Date: Tue,  1 Apr 2003 08:26:00 -0500 (EST)
From: relson at osagesoftware.com (David Relson)

Tue Apr  1 08:26:00 EST 2003

--B2CC727ECE.1049203561/osagesoftware.com--



More information about the Bogofilter mailing list