<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">

  <title></title>

</head>

<body>

<br>

<br>

Dave Lovelace wrote:<br>

<blockquote type="cite" cite="mid200305301945.PAA11010@firstcomp.biz">

  <pre wrap="">Jef Poskanzer wrote:

  </pre>

  <blockquote type="cite">

    <blockquote type="cite">

      <pre wrap="">This would help migration from a casefolded database as classification 

algorithn would degenerate to the existing lower case method and 

performance would be no worse than before. 

      </pre>

    </blockquote>

    <pre wrap="">I'm not 100% sure I'm following the discussion correctly, but

couldn't you also handle the migration issue with a little script

that dumps the database, duplicates all-lowercase tokens with

capitalized and all-uppercase versions, and makes a new db?

---

Jef

         Jef Poskanzer  <a class="moz-txt-link-abbreviated" href="mailto:jef@acme.com">jef@acme.com</a>  <a class="moz-txt-link-freetext" href="http://www.acme.com/jef/">http://www.acme.com/jef/</a>

    </pre>

  </blockquote>

  <pre wrap=""><!---->That would not suffice.  It would add "Spam" and "SPAM" but not "SPam",

"sPam", "sPAm", "SPAm", "SpAm", ...

And I personally don't think adding every variant on every token is what

anyone would want.

  </pre>

</blockquote>

Especially since that would bloat the db from 1 token per word to 2^n tokens

per word (where n is the word length).<br>

</body>

</html>