[bogofilter] spamitarium [was: using block_on_subnets]
relson at osagesoftware.com
Fri Apr 30 22:20:58 EDT 2004
On Fri, 30 Apr 2004 22:13:34 -0400
Tom Allison wrote:
> Tom Anderson wrote:
> > From: "Jason A. Smith" <jazbo at jazbo.dyndns.org>
> >>By stripping the bogus Received headers, aren't you throwing away
> >>potentially useful information which would help bogofilter classify
> >the>email as spam? I would think those headers contain valuable data
> > Others may disagree, but think about it this way: the spammer
> > intentionally inserted fake received lines because he thought that
> > it would tend to decrease the effectiveness of your filter. I agree
> > with the spammers that bogus routing information may cause the email
> > to look hammier than otherwise. Sometimes it might, sometimes it
> > might not. But either way, it is just extra noise... a red herring
> > to throw off the real trail. I'd prefer to filter on true
> > information than play the spammer's game. Plus, the very fact that
> > the line is fake is recorded as "rcvd:untrusted", which helps to
> > classify it correctly as spam, whereas the alternative may look
> > hammy.
> It seems that if you could successfully remove all the chaff it would
> only improve things. Consider bogofilter and Bayesian filtering in
> general. Until Bayes was proven effective, spammers didn't load the
> bottom of the body with random dictionary spews.
> Couldn't the same analogy be applied to Received headers?
> If we could accurately remove the dictionary spews, Bayesian filtering
> of all types would be very accurate.
> I would suspect that similarly removing all the Received junk would
> help determine the spamicity of the ASN.
> However, I think the real test is going to be running a test of some
> 1,000's of emails and comparing the relative score with and without
> the spamatarium contribution to the emails.
Exactly. Ideas and theories have their value. They lead to tests and
experiments which help separate the wheat from the chaff :-) By all
means, let us know what you find out.
More information about the Bogofilter