Ignoring headers [was: SpamAssassin's header lines]
Eric Seppanen
eds at reric.net
Mon Oct 7 21:54:31 CEST 2002
> >What we need is to allow the user to specify which headers to ignore. Tokens
> >added by SA or whatever may be considered useful by some, not by
> >others. There's
> >no question they're an external bias to the system... depends on whether
> >or not
> >you think that's good.
> >
> >Since I've already proposed major lexer changes, I will take this on also. I
> >suppose the list of headers to ignore will be spec'ed in the RC/ini file that
> >Eric S. is working? Any other ideas... let me know.
>
> Mark,
>
> Don't forget about Eric Seppanen's plans to implement an ignore list. It
> seems that he's oriented more towards individual words while you're
> orienting towards header lines. Also, he's looking at tokens after
> they're parsed and you seem to be looking at parser changes.
>
> Perhaps the two of you should put your heads together and see what kind of
> solution you can design...
Well, I don't think that the "ignore-list" idea can (or should) be
expanded to be aware of message structure (what's a header, what's not)
because the ignore-list support is literally only a few lines of code when
we look up the spamicity of a token.
The idea that bogofilter should be aware of message headers, possibly
having the ability to add/remove/modify them, has merit. I think that for
now, anyway, our treatment of headers should be a separate problem,
because it has to look at the message early on, before we shred it into a
"bag of tokens".
I don't think that the treatment (keep or discard; examine or
don't-examine) of headers like spamassasin adds should be hard-wired into
the lexer; I think there will be users who will be dead-set against this,
and others that are dead-set for it.
I have a gut feeling that one day, the flex lexer.l won't be good enough.
Possible reasons: MIME-encoded messages, specialized treatment of header
fields, HTML handling, international charset support...
More information about the bogofilter-dev
mailing list