TO: vs CC:
relson at osagesoftware.com
Thu Jan 15 08:18:15 EST 2004
On Thu, 15 Jan 2004 08:49:30 +0100
Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> wrote:
> David Relson <relson at osagesoftware.com> wrote:
> >Anybody object to changing bogofilter to tag CC: like TO: ???
> There are two points:
> 1) Nobody did notice that for a very long time, so most
> likely the effect will be very small, maybe too small to
> make the change.
The change is 2 lines of code. In lexer_v3.l add "cc" to the line that
recognizes "to" and in token.c add "case 'c':" before the "case 't':"
line. That's it. Just 2 lines of code.
> 2) It could well be that it makes things worse. It could be
> that it makes a significant difference if things show up in
> To: or CC:. Maybe tagging as cc: is much more useful.
> So we need testing to see if that is a good idea.
It made a significant difference for my test case. You're welcome to
If you want to be thorough in your testing, here's a suggestion to see
what's of potential usefulness:
RFC 2821 has the official list of header lines. Tag each of them
differently. Feed a whole lot of ham and spam into an empty database.
Look at all the header lines tokens you have, in particular their
ham/spam scores. See what tags give the most significant scores (high
spam or low ham).
After evaluating for potential usefulness, pick the 3 or 4 most likely
candidates and then test to see if they actually make a difference.
In the messages I encountered yesterday, the initial message had all the
recipients tagged with "to:" and its score was zero. The second message
had the recipients tagged with "head:" and its score was 0.50000 -
squarely in the unsure realm. Changing the tags for those 30 recipients
to "to:" changed the score to zero, which is proper. This was a clear
and obvious correction to an oversight in the implementation of header
More information about the Bogofilter