info about spam messages

Chris Wilkes cwilkes-bf at ladro.com
Fri Jun 11 16:33:33 CEST 2004


On Fri, Jun 11, 2004 at 09:54:04AM -0400, David Relson wrote:
> 
> Question:  How well does bogofilter's text parsing work with Turkish?

On a related note I started to wonder about what Korean or Chinese or
any other language that uses glyphs spam looks like.

In languages that are ascii-based (how's that for rewritting history?)
there's a lot of spam with words that are slightly misspelled or done in
elite hacker speak:
  via-gra
  v1agr4
etc.  Is there the same beast in glyph based languages?  Can one have a
character that looks like a real word/phrase?  Are there nonsense words
like "gra"?

I suppose with unicode you can't have the top half of one glyph as a
token and the bottom half as the second glyph (thus when reading you're
mashing the two together to form one glypg) as there isn't a unicode
entry for that, right?

Chris



More information about the Bogofilter mailing list