decoding implementation

Matthias Andree matthias.andree at gmx.de
Mon Nov 25 05:16:32 CET 2002


Clint Adams <schizo at debian.org> writes:

> I don't think that we should.  The difference is significant.
> Except in a message like this, I don't think anyone from whom I want to
> receive mail is going to spell 'zoo' as zοо, zоo, or zoο.  Barring some
> accident, they're going to spell it 'zoo'.  Assuming this is true, I'd
> want 'zoo' to be a non-spam token, and the Greco-Russian spellings to
> be spam tokens.

You have a valid point there.

> The same is true of high-bit apostrophes and such; I'd want the lexer to
> differentiate between them on all platforms, since I doubt that anyone
> will send me legitimate mail containing them.

Oh yes they will. Microsoft Office and other crap will happily send
these typographic quotes in documents, and what's even more fun, declare
ISO-8859-1 for that (rather than Windows-1252 which would be correct.)
or us-ascii or... Try typing "this is strange" (with quotes) in MS Word
and see how it changes that to “this is strange”.

-- 
Matthias Andree



More information about the bogofilter-dev mailing list