Why strip headers?

Ben Finney ben at benfinney.id.au
Fri May 6 03:59:34 CEST 2005


On 05-May-2005, David Relson wrote:
> Ben Finney wrote:
> > =====
> > Moreover, headers which do not directly influence the email in any
> > functional way, nor are visible to the end-user in a standard
> > graphical MUA, are highly likely to contain information which 
> > spammers think will detract from normal statistical filtering. It
> > is therefore desireable to remove these elements, specifically 
> > X-headers, prior to filtering.  Spamitarium removes all invisible,
> > non-functional header lines.
> > =====
> > 
> > Is it foolishly naïve of me to think that bogofilter knows much
> > more about my personal mail history than some spammer, and can
> > judge those bogus headers as is?
> 
> All bogofilter knows about your email is which ones you've told it
> are spam and which ones are ham.  If there are different X-Headers
> it the two message sets, then their presence may well help
> bogofilter in its spam vs ham scoring.

Right. So for messages that are *ham*, that contain X-Foo header
fields set by well-behaved software or knowledgeable correspondents,
why would I want bogofilter not to see those and learn from them?

> Some (many?) mail delivery agents add X-Header lines to a message.
> If _yours_ adds one or X-Header lines, bogofilter will see them in
> _every_ ham and _every_ spam.  The result is tokens with scores of
> 0.5 which are ignored when scoring.

And if I want bogofilter to learn from the X-Foo header fields, how
does stripping them help me?

In particular, many administrators configure spamassassin to make
decisions about a mail and put those decisions in X-Spam or other
header fields, so that individual users can decide for themselves
about whether or not to dump the message.  This links nicely with
bogofilter, since it can learn about spam or ham by seeing how well my
decisions match spamassassin's results.

> Stripping X-Header lines, as Tom does, may or may not have an
> effect. It all depends on your particular mail setup.

My main concern with spamitarium is that it assumes X-Foo header
fields are malicious by default. On the contrary, there is often a lot
of useful information in them that bogofilter can learn from.

-- 
 \        "The most merciful thing in the world... is the inability of |
  `\         the human mind to correlate all its contents."  -- Howard |
_o__)                                                Philips Lovecraft |
Ben Finney <ben at benfinney.id.au>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20050506/ba8cbfa4/attachment.sig>


More information about the Bogofilter mailing list