Learning Backscatter

Thu Jan 15 16:54:58 CET 2009

On Thu, 15 Jan 2009 08:56:04 -0500
Thomas Anderson <tanderson at orderamidchaos.com> wrote:

> On Sat, 2009-01-10 at 17:47 +0000, RW wrote:
> > I was wondering how good Bogofilter is at distinguishing between
> > backscatter, and legitimate delivery failure messages. 
> > 
> > Specifically, does it look inside the attached original email.
> 
> It's not too good at it.  I do train my spam bounces, but it tends to
> make legitimate bounces spammy as well.  What I do is weed out known
> backscatterers first via the ips.backscatterer.org block list, and
> then (after bogofilter) I push all bounces, spam or not, into a
> seperate folder for review.  

In the end I decided to just have sieve file them into a folder and
auto-delete them after a few days with find. Since joe-jobs tend to
come in bursts, it seems like a reasonable plan. I have a second
account for mailing lists, and in that I'll just learn all delivery
failures as spam.

FWIW what I was thinking of doing is to configure my mail clients to
add a custom header containing a dozen or so unique tokens, and them
creating some deliberate bounces for Bogofilter to learn. I think that
would probably would work well - except for the minority of bounces
that don't return original headers.

The best solution I've seen is the one used by Tuffmail whereby they
rewrite the smtp "mail from" address into a disposable address that
blocks after a few days, and then block  DSNs to the real address.
There are a few caveats, so some people may need to turn it off on some
addresses, but in my experience it works very well. From the look of it
it's just done through some simple hashing - probably just a few lines
of perl.