Why strip headers?

David Relson relson at osagesoftware.com
Fri May 6 03:43:16 CEST 2005


On Fri, 6 May 2005 09:59:14 +1000
Ben Finney wrote:

> On 06-May-2005, Ben Finney wrote:
> > On 05-May-2005, Tom Anderson wrote:
> > > I also clean up my headers with this one:
> > > http://orderamidchaos.com/bogofilter/spamitarium
> > 
> > I don't see the purpose of that one.  Why would you not give
> > bogofilter all the information about the original message that you
> > can, to help it learn?
> 
> Specifically, this doesn't sound right (from spamitarium's POD
> documentation):
> 
> =====
> Moreover, headers which do not directly influence the email in any
> functional way, nor are visible to the end-user in a standard
> graphical MUA, are highly likely to contain information which 
> spammers think will detract from normal statistical filtering. It
> is therefore desireable to remove these elements, specifically 
> X-headers, prior to filtering.  Spamitarium removes all invisible,
> non-functional header lines.
> =====
> 
> Is it foolishly naïve of me to think that bogofilter knows much more
> about my personal mail history than some spammer, and can judge those
> bogus headers as is?

Hi Ben,

All bogofilter knows about your email is which ones you've told it are
spam and which ones are ham.  If there are different X-Headers it the
two message sets, then their presence may well help bogofilter in its
spam vs ham scoring.

Some (many?) mail delivery agents add X-Header lines to a message.  If
_yours_ adds one or X-Header lines, bogofilter will see them in _every_
ham and _every_ spam.  The result is tokens with scores of 0.5 which
are ignored when scoring.

Stripping X-Header lines, as Tom does, may or may not have an effect.
It all depends on your particular mail setup.

HTH,

David




More information about the Bogofilter mailing list