decoding implementation

Clint Adams schizo at debian.org
Sun Nov 24 23:19:48 CET 2002


> One other thing I can imagine though: How the heck can we treat Greek
> omikron, Latin o (oh) and Cyrillic o the same? Three different

I don't think that we should.  The difference is significant.
Except in a message like this, I don't think anyone from whom I want to
receive mail is going to spell 'zoo' as zοо, zоo, or zoο.  Barring some
accident, they're going to spell it 'zoo'.  Assuming this is true, I'd
want 'zoo' to be a non-spam token, and the Greco-Russian spellings to
be spam tokens.

The same is true of high-bit apostrophes and such; I'd want the lexer to
differentiate between them on all platforms, since I doubt that anyone
will send me legitimate mail containing them.



More information about the bogofilter-dev mailing list