Using casefolded wordlists
Peter Bishop
pgb at adelard.com
Fri May 30 16:24:48 CEST 2003
On 30 May 2003 at 7:09, Greg Louis wrote:
> > >From my unscientific sample it appears that ham is affected more than spam
> > and the effect could be to increase false positives until the wordlists get
> > updated with mixed case words
>
> That could be so. I suggested advising people to classify with -Pi but
> train with -PI for a couple of months if they couldn't rebuild their
> training databases; alternatively, one could speed up the process (as I
> did at work, where I can't rebuild) by training, with -PI, on a large
> batch of new messages (roughly equal numbers of spam and nonspam) right
> after doing the upgrade.
>
Looks like a good strategy
Do you have any performance measures before and after the changeover?
--
Peter Bishop
pgb at adelard.com
pgb at csr.city.ac.uk
More information about the Bogofilter
mailing list