On 11 Aug 2003 at 15:16, Matthias Andree wrote: > So should we drop the "minimum token size" limit to deal with " B R O K > E N U P " tokens? > Or should the tokeniser treat a sequence space-separated single letters as a single token? e.g.: B R O K E N U P is tokenised as: B-R-O-K-E-N U-P -- Peter Bishop pgb at adelard.com pgb at csr.city.ac.uk