[PATCH] Better tagging.

Mon Sep 15 01:33:29 CEST 2003

David Relson <relson at osagesoftware.com> writes:

>> True. My idea would then be to use the traditional "return TOKEN"
>> approach, but extend it to add the h: tag when we're in header mode
>> (as opposed to body mode).
>
> Given the sensitivity of the parser to changes in the rules, I've opted
> for the conservative approach, i.e. parse with new and old rules,
> identify differences, revise to avoid problems, then (and only then) do
> the big test to quantify the value of the changes.

That's fine.

> In the patch, the "h:" tag is set by calling set_tag("Head") for each
> newline.  Interesting lines call set_tag() which replaces the "h:" tag.
>
> Are you suggesting that "charset=us-ascii" should produce "h:charset"
> and "h:us-ascii" ??

Yes, I am. It shouldn't replace h:"us-ascii" though. We're not yet
scoring word pairs as compounds or using conditional probabilities.

-- 
Matthias Andree

Encrypt your mail: my GnuPG key ID is 0x052E7D95