Using casefolded wordlists
Greg Louis
glouis at dynamicro.on.ca
Fri May 30 22:59:23 CEST 2003
On 20030530 (Fri) at 1524:48 +0100, Peter Bishop wrote:
> > I suggested advising people to classify with -Pi but
> > train with -PI for a couple of months if they couldn't rebuild their
> > training databases; alternatively, one could speed up the process (as I
> > did at work, where I can't rebuild) by training, with -PI, on a large
> > batch of new messages (roughly equal numbers of spam and nonspam) right
> > after doing the upgrade.
> >
>
> Looks like a good strategy
> Do you have any performance measures before and after the changeover?
ns=nonspam, sp=spam, fp=false positive (the six fp in this run were
"legitimate" in the sense that they were highly spammy-looking and I
prefer to see them rated as spam):
pre: 5,621 correct ns, 347 correct sp, 1 sp rated ns, 6 fp,
8 sp unsure, 98 ns unsure, 2.53% spam delivered
Iht: 5,598 correct ns, 355 correct sp, 0 sp rated ns, 6 fp,
1 sp unsure, 121 ns unsure, 0.28% spam delivered
That was immediately after the changeover, rating the same set of
personal messages. I also evaluated the next six days' worth of email
from work: I couldn't do these with the old conditions, so the "pre"
values are from an April run:
pre: 12,550 correct ns, 12,683 correct sp, 0 sp rated ns, 0 fp,
873 sp unsure, 2,486 ns unsure, 6.44% spam delivered
Iht: 6,619 correct ns, 7,763 correct sp, 0 sp rated ns, 0 fp,
47 sp unsure, 378 ns unsure, 0.60% spam delivered.
Makes quite a difference.
--
| G r e g L o u i s | gpg public key: finger |
| http://www.bgl.nu/~glouis | glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |
More information about the Bogofilter
mailing list