parsing options

Thu May 15 18:08:54 CEST 2003

David Relson wrote, in part:
> 
> It would be nice to have either the historical defaults correspond to all 
> upper case or all lower case.
> 

Well, that raises another issue for us here.  For us, bogofilter has
been working very well, though of course improvement would be welcome.
(I haven't been doing the update-of-the-day or anything; we're still at
0.11.1.8.)

You keep saying that turning off case folding improves
accuracy (and I'm not questioning that).  But we have fairly large
databases (well, not by some people's standards) created by versions
that did case folding.  I have a strong suspicion that updating to a
current rev & turning off case folding will be a disaster in terms of
accuracy, short term, as the lower/mixed case tokens are not in either
database.  And the mail that generated those databases is
by & large one with the snows of yesteryear.

Is any improvement from turning off case folding going to be worth the
hassle of retraining bogofilter over several weeks?

-- 
- Dave Lovelace
  dave at firstcomp.biz
  davel at cyberspace.org