Unicode and spaces
Clint Adams
schizo at debian.org
Thu Mar 27 23:37:34 CET 2003
To follow up with the wide spaces.. once charset conversion is
implemented, we may want to treat any or all of the following the same
as ASCII SPC.
U+0020 SPACE
UTF-8: 20 UTF-16BE: 0020 Decimal:
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
U+00A0 NO-BREAK SPACE
UTF-8: c2 a0 UTF-16BE: 00a0 Decimal:
Category: Zs (Separator, Space)
Bidi: CS (Common Number Separator)
Decomposition: <noBreak> 0020
U+1680 OGHAM SPACE MARK
UTF-8: e1 9a 80 UTF-16BE: 1680 Decimal:
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
U+2000 EN QUAD
UTF-8: e2 80 80 UTF-16BE: 2000 Decimal:
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: 2002
U+2001 EM QUAD
UTF-8: e2 80 81 UTF-16BE: 2001 Decimal:
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: 2003
U+2002 EN SPACE
UTF-8: e2 80 82 UTF-16BE: 2002 Decimal:
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020
U+2003 EM SPACE
UTF-8: e2 80 83 UTF-16BE: 2003 Decimal:
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020
U+2004 THREE-PER-EM SPACE
UTF-8: e2 80 84 UTF-16BE: 2004 Decimal:
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020
U+2005 FOUR-PER-EM SPACE
UTF-8: e2 80 85 UTF-16BE: 2005 Decimal:
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020
U+2006 SIX-PER-EM SPACE
UTF-8: e2 80 86 UTF-16BE: 2006 Decimal:
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020
U+2007 FIGURE SPACE
UTF-8: e2 80 87 UTF-16BE: 2007 Decimal:
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <noBreak> 0020
U+2008 PUNCTUATION SPACE
UTF-8: e2 80 88 UTF-16BE: 2008 Decimal:
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020
U+2009 THIN SPACE
UTF-8: e2 80 89 UTF-16BE: 2009 Decimal:
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020
U+200A HAIR SPACE
UTF-8: e2 80 8a UTF-16BE: 200a Decimal:
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020
U+200B ZERO WIDTH SPACE
UTF-8: e2 80 8b UTF-16BE: 200b Decimal:
Category: Zs (Separator, Space)
Bidi: BN (Boundary Neutral)
U+202F NARROW NO-BREAK SPACE
UTF-8: e2 80 af UTF-16BE: 202f Decimal:
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <noBreak> 0020
U+205F MEDIUM MATHEMATICAL SPACE
UTF-8: e2 81 9f UTF-16BE: 205f Decimal:
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020
U+3000 IDEOGRAPHIC SPACE
UTF-8: e3 80 80 UTF-16BE: 3000 Decimal:
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <wide> 0020
More information about the Bogofilter
mailing list