Templates [was: Prediction ...]

Tom Allison tallison at tacocat.net
Sat Jul 3 02:01:53 CEST 2004


Tom Anderson wrote:
> From: "Tom Allison" <tallison at tacocat.net>
> 
>>What I've seen is that >>99% of my spam is from ip addresses that only
>>send one message over 4 months time.  So there is little net improvement
>>on detecting spam since almost every IP address will be dominated on the
>>bases of robx/robs settings.  So there is not much effect on detecting
> 
> Spam.
> 
> This discussion is not about detecting IPs for filtering, but for outputting
> IPs in the logs so that people can use them for a blacklist.  Bogofilter
> already uses IPs for filtering.  No big deal there, since it's just another
> token taken together with the rest of them in the message.  But if you're
> going to single out a token and say, "this is definitely the IP of the
> connecting mail server, use it in your blacklist," then it's much more
> important to be absolutely certain that it really is.  The fact that there
> is uncertainty and the difficulty in obtaining any measure of certainty is
> the heart of this discussion.  I've suggested not adding this functionality
> because of this.
> 

That's what I get for barging into a conversation.  :)

As for the IP's for filtering:
Sounds like something that could well be managed by a brief perl script 
to parse out the Received: headers for identified spam and load them 
into an access list suitable for your MTA.

The two problems are the identification of a proper regex for parsing 
out the IP address correctly.  I do think perl could do this really well 
in one line.

For example:
gizmo11ps.bigpond.com (gizmo11ps.bigpond.com [144.140.71.21])
         by cling.tacocat.net (Postfix) with SMTP id 5F3C54C081

Should work out to:
/(\d+\.\d+\.\d+\.\d+).+?by $fqdn_localhost/o
Should set $1 to the IP address every time.
This is taking the octect of numbers closest to the left of the string 
"by cling.tacocat.net" which is going to be the identifying string of 
the connecting server to your localhost machine (cling.tacocat.net for 
me, of Cling and Clang fame).

Even with amavisd this will work as subsequent lines are 'localhost' and 
not FQDN (I think).  That or NOT 127.0.0.1.

I don't know much about C.  But this is effectively the rule to put into 
place.  I would expect that if there are any cases where the IP address 
is not provided as a numerical octet, then you're kind of S.O.L.

Postfix already provides 99% of this under their new version under the 
topic of GreyListing and "Access Policy Delegation"  It's extremely 
effective on what it can do.

 From a functional view, I think bogofilter should stick to the 
"Mission" of filtering that mail which is delivered to your mailbox into 
selections of spam/ham.  My relatively incomplete implimentation of 
postfix UCE capabilities has introduced a new problem into spam 
filtering for me.  I can't get enough to do anything meaningful with 
respect to bogofilter.

I haven't exceeded 37 spam in 24 hours since I installed it.
I'm rejecting >550 (probable) spam at the MTA level.

I believe that this is where the spam blocking is most effective since 
it's not adding any overhead to my machine to turn the email away at the 
wire interface.



More information about the Bogofilter mailing list