Using casefolded wordlists

Greg Louis glouis at dynamicro.on.ca
Fri May 30 22:59:23 CEST 2003


On 20030530 (Fri) at 1524:48 +0100, Peter Bishop wrote:

> > I suggested advising people to classify with -Pi but
> > train with -PI for a couple of months if they couldn't rebuild their
> > training databases; alternatively, one could speed up the process (as I
> > did at work, where I can't rebuild) by training, with -PI, on a large
> > batch of new messages (roughly equal numbers of spam and nonspam) right
> > after doing the upgrade.
> > 
> 
> Looks like a good strategy
> Do you have any performance measures before and after the changeover?

ns=nonspam, sp=spam, fp=false positive (the six fp in this run were
"legitimate" in the sense that they were highly spammy-looking and I
prefer to see them rated as spam):

pre: 5,621 correct ns, 347 correct sp, 1 sp rated ns,  6 fp,
     8 sp unsure,  98 ns unsure, 2.53% spam delivered

Iht: 5,598 correct ns, 355 correct sp, 0 sp rated ns,  6 fp,
     1 sp unsure, 121 ns unsure, 0.28% spam delivered

That was immediately after the changeover, rating the same set of
personal messages.  I also evaluated the next six days' worth of email
from work: I couldn't do these with the old conditions, so the "pre"
values are from an April run:

pre: 12,550 correct ns, 12,683 correct sp, 0 sp rated ns, 0 fp,
     873 sp unsure, 2,486 ns unsure, 6.44% spam delivered

Iht: 6,619 correct ns, 7,763 correct sp, 0 sp rated ns, 0 fp,
     47 sp unsure, 378 ns unsure, 0.60% spam delivered.

Makes quite a difference.

-- 
| G r e g  L o u i s          | gpg public key: finger     |
|   http://www.bgl.nu/~glouis |   glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |




More information about the Bogofilter mailing list