Prediction [was: spam addrs]

Peter Bishop pgb at adelard.com
Thu Jul 1 10:32:27 CEST 2004


On 29 Jun 2004 at 18:06, Tom Anderson wrote:

> > I've looked at spamitarium's regexes and confess that, to my
> > inexperienced eye, they're complex.  Give me a simple rule for
> > distinguishing them and I can try to implement it.
> 
> I don't think there is a simple rule like you propose.  Due to the different
> formats given by different MTAs, and the ability for spammers to forge one
> or more fields, it requires a complex expression.  Brackets and parentheses
> are optional in many cases, IP and rDNS and IDENT information may or may not
> be present, and these elements may all be arranged in many different ways.
> For instance, here are a few:

Could I suggest that you let the *end user* specify the format of the
MTA Received line?

e.g. if the user wants the line processed
1) specify the received line template for *their* specific MTA
2) extract the required addrees
3) insert a special header line into the message:

e.g. the template:
 
^Received: from .*  \[([0-9\.].+)\] 	by .*

would be suffificient for my MTA, and this could be mapped to:

MTA-IP-Address: $1

This could be processed and scored by bogofilter in the normal way.

The MTA received line processing could be done using procmail and 
formail before bogofilter is called.

Alternatively something like spamitarium could easily be modifed to
do a similar job.

OR maybe bogofilter could allow a user-specified MTA template, 
e.g. specified as an option in bogofilter.cf
to identify the required IP address field. 

This would need some template matching code in bogofilter, 
but maybe we could just invoke a standard regex library.

To allow for different MTAs, you could leave some commented out 
templates in the config file for the user to select, 
But if the MTA is not in the list he can still "roll his own"


-- 
Peter Bishop 
pgb at adelard.com
pgb at csr.city.ac.uk





More information about the Bogofilter mailing list