Why strip headers?

David Relson relson at osagesoftware.com
Fri May 6 05:09:37 CEST 2005


On Fri, 6 May 2005 11:59:34 +1000
Ben Finney wrote:

> On 05-May-2005, David Relson wrote:
> > Ben Finney wrote:
> > > =====
> > > Moreover, headers which do not directly influence the email in any
> > > functional way, nor are visible to the end-user in a standard
> > > graphical MUA, are highly likely to contain information which 
> > > spammers think will detract from normal statistical filtering. It
> > > is therefore desireable to remove these elements, specifically 
> > > X-headers, prior to filtering.  Spamitarium removes all invisible,
> > > non-functional header lines.
> > > =====
> > > 
> > > Is it foolishly naïve of me to think that bogofilter knows much
> > > more about my personal mail history than some spammer, and can
> > > judge those bogus headers as is?
> > 
> > All bogofilter knows about your email is which ones you've told it
> > are spam and which ones are ham.  If there are different X-Headers
> > it the two message sets, then their presence may well help
> > bogofilter in its spam vs ham scoring.
> 
> Right. So for messages that are *ham*, that contain X-Foo header
> fields set by well-behaved software or knowledgeable correspondents,
> why would I want bogofilter not to see those and learn from them?
> 
> > Some (many?) mail delivery agents add X-Header lines to a message.
> > If _yours_ adds one or X-Header lines, bogofilter will see them in
> > _every_ ham and _every_ spam.  The result is tokens with scores of
> > 0.5 which are ignored when scoring.
> 
> And if I want bogofilter to learn from the X-Foo header fields, how
> does stripping them help me?
> 
> In particular, many administrators configure spamassassin to make
> decisions about a mail and put those decisions in X-Spam or other
> header fields, so that individual users can decide for themselves
> about whether or not to dump the message.  This links nicely with
> bogofilter, since it can learn about spam or ham by seeing how well my
> decisions match spamassassin's results.
> 
> > Stripping X-Header lines, as Tom does, may or may not have an
> > effect. It all depends on your particular mail setup.
> 
> My main concern with spamitarium is that it assumes X-Foo header
> fields are malicious by default. On the contrary, there is often a lot
> of useful information in them that bogofilter can learn from.

Ben,

Since Tom is distributing the source code, you can modify it as you see
fit.  If I were in your shoes, I'd implement the change as a command
line switch and send a patch to Tom.  If he accepted the patch, then
you might be able to use future versions without the need to customize
them.

Ciao,

David




More information about the Bogofilter mailing list