SpamAssassin's header lines

Doug Beardsley dgbeards at southern.edu
Mon Oct 7 16:10:28 CEST 2002


Ok, I see your point.  I was thinking with the idea that we are trying
to detect spam based on the statistical characteristics of spam email
messages.  In their purest form, spam email messages do not contain the
extra header information.  So it would seem that our efforts should not
use outside information.  From a theoretical perspective, I think this
is the way to go.  From a practical perspective, the presence of the
"{SPAM?}" provides a big clue that can aid in our classification, so it
would be good to use it.  However, the headers that SpamAssassin adds do
not help either way.  I haven't checked, but I would imagine that the
tokens from those headers will never be used.  So I guess it doesn't hurt
to leave them in.  But, it would make sense to ignore them since we know
that they do not contribute to the detection of spam.

Doug Beardsley

On Mon, Oct 07, 2002 at 02:00:10PM -0400, Ben Rosengart wrote:
> On Mon, Oct 07, 2002 at 09:41:23AM -0400, Doug Beardsley wrote:
> > 
> > I don't see how SpamAssassin's header lines would make a difference.
> > The header lines are appended to every message that is checked by
> > SpamAssassin whether it is flagged as spam or not.  If your ISP/mail
> > sever uses SpamAssassin, then all of your email messages will have a
> > SpamAssassin header in them.  This information is clearly not useful for
> > our purposes of spam classification.  What is interesting is the
> > probability of the "{SPAM?}" (without quotes) string that SpamAssassin
> > sometimes adds to messages that it classifies as spam.  I think we
> > should get rid of all those headers and maybe have an optional parameter
> > allowing us to get rid of the "{SPAM?}" string in the subject line so
> > they do not interfere with bogofilter's detection.
> 
> What do you mean, "interfere"?  This is valid input!
> 
> -- 
> Ben Rosengart     (212) 741-4400 x215
> 
> Microsoft has argued that open source is bad for business, but you
> have to ask, "Whose business?  Theirs, or yours?"    --Tim O'Reilly



More information about the bogofilter-dev mailing list