[PATCH] Better tagging.

David Relson relson at osagesoftware.com
Sat Sep 13 14:25:22 CEST 2003


Hi Michael,

Bogofilter's development is an ongoing process.  The set of headers
selected for tagging is based on Paul Graham's "Better Bayesian
Filtering" article and is not cast in stone.  

As you know, there's been some recent work to exclude tokens likely to
be unique, in particular message IDs.  That's also why Delivery-Date:,
Resent-Message-ID:, In-Reply-To:, and References: get special treatment.
 Seems like we now have three sets rules:

1 - original
2 - specially treated (as described above)
3 - proposed changes.

Looks like I'll have to run some tests to measure the effectiveness of
the different rule sets.

Using "char *" instead of "word_t *" is pretty painless.  It does make
the parser API less uniform a bad thing.  Given that set_tag()'s
parameter isn't used in other routines, it should be acceptable.

Explicitly returning NONE (or perhaps EOF or EOM or something) rather
than 0 is good.  I recently made some similar changes, converting -1's
to EOF, which is accurate and more informative.

Thanks for the PATCH.  You should see some of it in the next release.

Gotta go now - Saturday morning  familial duties :-)

David

-- 
David Relson                   Osage Software Systems, Inc.
relson at osagesoftware.com       Ann Arbor, MI 48103
www.osagesoftware.com          tel:  734.821.8800




More information about the bogofilter-dev mailing list