Unicode and spaces

Clint Adams schizo at debian.org
Thu Mar 27 23:37:34 CET 2003


To follow up with the wide spaces.. once charset conversion is
implemented, we may want to treat any or all of the following the same
as ASCII SPC.

U+0020 SPACE 
UTF-8: 20   UTF-16BE: 0020   Decimal:  
 
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)

U+00A0 NO-BREAK SPACE 
UTF-8: c2 a0   UTF-16BE: 00a0   Decimal:  

Category: Zs (Separator, Space)
Bidi: CS (Common Number Separator)
Decomposition: <noBreak> 0020

U+1680 OGHAM SPACE MARK 
UTF-8: e1 9a 80   UTF-16BE: 1680   Decimal:  

Category: Zs (Separator, Space)
Bidi: WS (Whitespace)

U+2000 EN QUAD 
UTF-8: e2 80 80   UTF-16BE: 2000   Decimal:  

Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: 2002

U+2001 EM QUAD 
UTF-8: e2 80 81   UTF-16BE: 2001   Decimal:  

Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: 2003

U+2002 EN SPACE 
UTF-8: e2 80 82   UTF-16BE: 2002   Decimal:  

Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020

U+2003 EM SPACE 
UTF-8: e2 80 83   UTF-16BE: 2003   Decimal:  

Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020

U+2004 THREE-PER-EM SPACE 
UTF-8: e2 80 84   UTF-16BE: 2004   Decimal:  

Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020

U+2005 FOUR-PER-EM SPACE 
UTF-8: e2 80 85   UTF-16BE: 2005   Decimal:  

Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020

U+2006 SIX-PER-EM SPACE 
UTF-8: e2 80 86   UTF-16BE: 2006   Decimal:  

Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020

U+2007 FIGURE SPACE 
UTF-8: e2 80 87   UTF-16BE: 2007   Decimal:  

Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <noBreak> 0020

U+2008 PUNCTUATION SPACE 
UTF-8: e2 80 88   UTF-16BE: 2008   Decimal:  

Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020

U+2009 THIN SPACE 
UTF-8: e2 80 89   UTF-16BE: 2009   Decimal:  

Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020

U+200A HAIR SPACE 
UTF-8: e2 80 8a   UTF-16BE: 200a   Decimal:  

Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020

U+200B ZERO WIDTH SPACE 
UTF-8: e2 80 8b   UTF-16BE: 200b   Decimal: ​

Category: Zs (Separator, Space)
Bidi: BN (Boundary Neutral)

U+202F NARROW NO-BREAK SPACE 
UTF-8: e2 80 af   UTF-16BE: 202f   Decimal:  

Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <noBreak> 0020

U+205F MEDIUM MATHEMATICAL SPACE 
UTF-8: e2 81 9f   UTF-16BE: 205f   Decimal:  

Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020

U+3000 IDEOGRAPHIC SPACE 
UTF-8: e3 80 80   UTF-16BE: 3000   Decimal:  

Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <wide> 0020





More information about the Bogofilter mailing list