Serious problem with non-ASCII words
Jonathan Buzzard
jonathan at buzzard.org.uk
Fri Sep 20 20:26:25 CEST 2002
3.14 at logic.univie.ac.at said:
> Clearly whitespace and line ending are word delimiters. Also
> punctuation. This assumes we have charsets which are compatible with
> ASCII, though. But I don't see how we can do better. How about
> hyphens?
Well we could try paying attention to the "Content-type" header.
For example the original mail in this thread had a Content-type
header like this
Content-type: text/plain; charset=ISO-8859-1
And your mail had one like this
Content-type: text/plain; charset=us-ascii
One would have though from this point it is fairly obvious what to do.
Frankly Bogofilter should handle the same email in the same what
regardless of what locale it happens to be running under at the time.
JAB.
--
Jonathan A. Buzzard Email: jonathan at buzzard.org.uk
Northumberland, United Kingdom. Tel: +44(0)1661-832195
For summay digest subscription: bogofilter-digest-subscribe at aotto.com
More information about the Bogofilter
mailing list