[bogofilter] spamitarium [was: using block_on_subnets]
tallison at tacocat.net
Fri Apr 30 22:13:34 EDT 2004
Tom Anderson wrote:
> From: "Jason A. Smith" <jazbo at jazbo.dyndns.org>
>>By stripping the bogus Received headers, aren't you throwing away
>>potentially useful information which would help bogofilter classify the
>>email as spam? I would think those headers contain valuable data about
> Others may disagree, but think about it this way: the spammer intentionally
> inserted fake received lines because he thought that it would tend to
> decrease the effectiveness of your filter. I agree with the spammers that
> bogus routing information may cause the email to look hammier than
> otherwise. Sometimes it might, sometimes it might not. But either way, it
> is just extra noise... a red herring to throw off the real trail. I'd
> prefer to filter on true information than play the spammer's game. Plus,
> the very fact that the line is fake is recorded as "rcvd:untrusted", which
> helps to classify it correctly as spam, whereas the alternative may look
It seems that if you could successfully remove all the chaff it would
only improve things. Consider bogofilter and Bayesian filtering in
general. Until Bayes was proven effective, spammers didn't load the
bottom of the body with random dictionary spews.
Couldn't the same analogy be applied to Received headers?
If we could accurately remove the dictionary spews, Bayesian filtering
of all types would be very accurate.
I would suspect that similarly removing all the Received junk would help
determine the spamicity of the ASN.
However, I think the real test is going to be running a test of some
1,000's of emails and comparing the relative score with and without the
spamatarium contribution to the emails.
More information about the Bogofilter