[bogofilter] spamitarium [was: using block_on_subnets]

Tom Anderson tanderso at oac-design.com
Fri Apr 30 20:18:42 CEST 2004


From: "Jason A. Smith" <jazbo at jazbo.dyndns.org>
> By stripping the bogus Received headers, aren't you throwing away
> potentially useful information which would help bogofilter classify the
> email as spam?  I would think those headers contain valuable data about

Others may disagree, but think about it this way: the spammer intentionally
inserted fake received lines because he thought that it would tend to
decrease the effectiveness of your filter.  I agree with the spammers that
bogus routing information may cause the email to look hammier than
otherwise.  Sometimes it might, sometimes it might not.  But either way, it
is just extra noise... a red herring to throw off the real trail.  I'd
prefer to filter on true information than play the spammer's game.  Plus,
the very fact that the line is fake is recorded as "rcvd:untrusted", which
helps to classify it correctly as spam, whereas the alternative may look
hammy.

> the way that their spam software works.  Can your filter do ASN lookups
> on only the valid Received headers, skipping the bogus ones but also
> keeping those bogus headers in the email?

Not exactly.  If you don't pass the "d" flag, it'll probably prevent it from
being able to evaluate the validity, so may do what you want.  However,
that's not an intended functionality.  If you'd like to write a patch for an
extra parameter allowing that, I'd be happy to include it.  Writing that
myself is not a high priority though.

> I am going to try your filter because adding useful data like this can
> only help bogofilter, but I wouldn't want to remove potentially useful
> data from the spam email.

Bogofilter already ignores HTML comments.  It is the same concept.  The
potential for usefulness of this data is akin to the potential for spam to
actually contain a useful offer that you want to click on.  Not enough to
bother looking at it.  Why pollute your wordlist with known-fake noise?

> PS.  Can your filter process multiple email messages in a single
> file/pipe (mbox format)?  I would like to try processing my ham/spam
> corpus through your filter to add these ASNs and then regenerate my
> bogofilter wordlist.

No, it accepts a single email on STDIN.  It has not been tested in any other
capacity.  You could of course loop through your mbox with a shell script
though.  The primary purpose for spamitarium is to prefilter individual
emails in procmail before bogofilter gets it, not as a stand-alone utility.
I would of course be willing to modify it for that purpose if there were
enough interest.

Tom



More information about the Bogofilter mailing list