[bogofilter] spamitarium [was: using block_on_subnets]

Tom Allison tallison at tacocat.net
Sat May 1 04:13:34 CEST 2004


Tom Anderson wrote:
> From: "Jason A. Smith" <jazbo at jazbo.dyndns.org>
> 
>>By stripping the bogus Received headers, aren't you throwing away
>>potentially useful information which would help bogofilter classify the
>>email as spam?  I would think those headers contain valuable data about
> 
> 
> Others may disagree, but think about it this way: the spammer intentionally
> inserted fake received lines because he thought that it would tend to
> decrease the effectiveness of your filter.  I agree with the spammers that
> bogus routing information may cause the email to look hammier than
> otherwise.  Sometimes it might, sometimes it might not.  But either way, it
> is just extra noise... a red herring to throw off the real trail.  I'd
> prefer to filter on true information than play the spammer's game.  Plus,
> the very fact that the line is fake is recorded as "rcvd:untrusted", which
> helps to classify it correctly as spam, whereas the alternative may look
> hammy.
> 

It seems that if you could successfully remove all the chaff it would 
only improve things.  Consider bogofilter and Bayesian filtering in 
general.  Until Bayes was proven effective, spammers didn't load the 
bottom of the body with random dictionary spews.

Couldn't the same analogy be applied to Received headers?

If we could accurately remove the dictionary spews, Bayesian filtering 
of all types would be very accurate.

I would suspect that similarly removing all the Received junk would help 
determine the spamicity of the ASN.

However, I think the real test is going to be running a test of some 
1,000's of emails and comparing the relative score with and without the 
spamatarium contribution to the emails.



More information about the Bogofilter mailing list