TO: vs CC:

David Relson relson at osagesoftware.com
Thu Jan 15 14:18:15 CET 2004


On Thu, 15 Jan 2004 08:49:30 +0100
Boris 'pi' Piwinger <3.14 at logic.univie.ac.at> wrote:

> David Relson <relson at osagesoftware.com> wrote:
> 
> >Anybody object to changing bogofilter to tag CC: like TO: ???
> 
> There are two points:
> 
> 1) Nobody did notice that for a very long time, so most
> likely the effect will be very small, maybe too small to
> make the change.

Hi pi,

The change is 2 lines of code.  In lexer_v3.l add "cc" to the line that
recognizes "to" and in token.c add "case 'c':" before the "case 't':"
line.  That's it.  Just 2 lines of code.

> 2) It could well be that it makes things worse. It could be
> that it makes a significant difference if things show up in
> To: or CC:. Maybe tagging as cc: is much more useful.
> 
> So we need testing to see if that is a good idea.

It made a significant difference for my test case.  You're welcome to
test further.

If you want to be thorough in your testing, here's a suggestion to see
what's of potential usefulness:  

RFC 2821 has the official list of header lines.  Tag each of them
differently.  Feed a whole lot of ham and spam into an empty database. 
Look at all the header lines tokens you have, in particular their
ham/spam scores.  See what tags give the most significant scores (high
spam or low ham).

After evaluating for potential usefulness, pick the 3 or 4 most likely
candidates and then test to see if they actually make a difference.

In the messages I encountered yesterday, the initial message had all the
recipients tagged with "to:" and its score was zero.  The second message
had the recipients tagged with "head:" and its score was 0.50000 -
squarely in the unsure realm.  Changing the tags for those 30 recipients
to "to:" changed the score to zero, which is proper.  This was a clear
and obvious correction to an oversight in the implementation of header
tagging.

David




More information about the Bogofilter mailing list