SpamAssassin's header lines

Ben Rosengart br at panix.com
Mon Oct 7 21:32:16 CEST 2002


On Mon, Oct 07, 2002 at 10:10:28AM -0400, Doug Beardsley wrote:
> Ok, I see your point.  I was thinking with the idea that we are trying
> to detect spam based on the statistical characteristics of spam email
> messages.  In their purest form, spam email messages do not contain the
> extra header information. 

In their purest form, spam doesn't contain Received headers either --
those are added by servers along the route.  Should we remove them?
Absolutely not, they are quite valuable.

Nobody has yet explained to me why a user would run spamassassin-marked
message through bogofilter unless they wanted to.  SA can give a
spam determination via its return code, so it is very straightforward
to get an SA determination without marking up the message.
Furthermore, "spamassassin -d", used as a filter, removes SA markup.

So users already have the choice of whether or not they want SA to
influence bogofilter's judgment.  I don't see any reason to
second-guess them, given that we don't have any hard data as to
which would work better, and there are arguments to be made either
way.

> However, the headers that SpamAssassin adds do
> not help either way.  I haven't checked, but I would imagine that the
> tokens from those headers will never be used.  So I guess it doesn't hurt
> to leave them in.  But, it would make sense to ignore them since we know
> that they do not contribute to the detection of spam.

At this point you're just talking about a performance question,
not a correctness question.  From discussion in another thread,
the performance benefit of ignoring neutral words seems to be
questionable.  At least in the current design, one only wants to
ignore words if they are likely to be misleading.  I do not think that
the token "X-Spam-Status:" is likely to be misleading.  Even if it
were, it would be better to add it to a default ignorelist than to
have code in bogofilter specifically to filter it out.

-- 
Ben Rosengart     (212) 741-4400 x215

Microsoft has argued that open source is bad for business, but you
have to ask, "Whose business?  Theirs, or yours?"    --Tim O'Reilly



More information about the bogofilter-dev mailing list