Ignoring headers [was: SpamAssassin's header lines]

Eric Seppanen eds at reric.net
Mon Oct 7 21:54:31 CEST 2002


> >What we need is to allow the user to specify which headers to ignore.  Tokens
> >added by SA or whatever may be considered useful by some, not by 
> >others.  There's
> >no question they're an external bias to the system... depends on whether 
> >or not
> >you think that's good.
> >
> >Since I've already proposed major lexer changes, I will take this on also.  I
> >suppose the list of headers to ignore will be spec'ed in the RC/ini file that
> >Eric S. is working?  Any other ideas... let me know.
> 
> Mark,
> 
> Don't forget about Eric Seppanen's plans to implement an ignore list.  It 
> seems that he's oriented more towards individual words while you're 
> orienting towards header lines.   Also, he's looking at tokens after 
> they're parsed and you seem to be looking at parser changes.
> 
> Perhaps the two of you should put your heads together and see what kind of 
> solution you can design...

Well, I don't think that the "ignore-list" idea can (or should) be 
expanded to be aware of message structure (what's a header, what's not) 
because the ignore-list support is literally only a few lines of code when 
we look up the spamicity of a token.

The idea that bogofilter should be aware of message headers, possibly 
having the ability to add/remove/modify them, has merit.  I think that for 
now, anyway, our treatment of headers should be a separate problem, 
because it has to look at the message early on, before we shred it into a 
"bag of tokens".

I don't think that the treatment (keep or discard; examine or 
don't-examine) of headers like spamassasin adds should be hard-wired into 
the lexer; I think there will be users who will be dead-set against this, 
and others that are dead-set for it.

I have a gut feeling that one day, the flex lexer.l won't be good enough.  
Possible reasons: MIME-encoded messages, specialized treatment of header 
fields, HTML handling, international charset support...



More information about the bogofilter-dev mailing list