[PATCH] Better tagging.

David Relson relson at osagesoftware.com
Mon Sep 15 01:04:57 CEST 2003


On Mon, 15 Sep 2003 00:30:06 +0200
Matthias Andree <matthias.andree at gmx.de> wrote:

> David Relson <relson at osagesoftware.com> writes:
> 
> > The modified rules include spaces in tokens line "h:Mime-Version:
> > 1.0". Currently tokens can't have spaces, a detail that bogoutil
> > cares about.
> 
> I missed that, looking only at the patch. Clearly, this needs to be
> split.
> 
> > Currently 'charset=us-ascii' and 'charset="us-ascii"' both generate
> > 'charset' and 'us-ascii'.  With the new rules they generate
> > 'h:charset=us-ascii' and 'h:charset="us-ascii"', which is another
> > inclusion of an illegal character.
> 
> True. My idea would then be to use the traditional "return TOKEN"
> approach, but extend it to add the h: tag when we're in header mode
> (as opposed to body mode).

Given the sensitivity of the parser to changes in the rules, I've opted
for the conservative approach, i.e. parse with new and old rules,
identify differences, revise to avoid problems, then (and only then) do
the big test to quantify the value of the changes.

In the patch, the "h:" tag is set by calling set_tag("Head") for each
newline.  Interesting lines call set_tag() which replaces the "h:" tag.

Are you suggesting that "charset=us-ascii" should produce "h:charset"
and "h:us-ascii" ??




More information about the bogofilter-dev mailing list