Unicode and spaces
    Clint Adams 
    schizo at debian.org
       
    Thu Mar 27 23:37:34 CET 2003
    
    
  
To follow up with the wide spaces.. once charset conversion is
implemented, we may want to treat any or all of the following the same
as ASCII SPC.
U+0020 SPACE 
UTF-8: 20   UTF-16BE: 0020   Decimal:  
 
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
U+00A0 NO-BREAK SPACE 
UTF-8: c2 a0   UTF-16BE: 00a0   Decimal:  
Category: Zs (Separator, Space)
Bidi: CS (Common Number Separator)
Decomposition: <noBreak> 0020
U+1680 OGHAM SPACE MARK 
UTF-8: e1 9a 80   UTF-16BE: 1680   Decimal:  
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
U+2000 EN QUAD 
UTF-8: e2 80 80   UTF-16BE: 2000   Decimal:  
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: 2002
U+2001 EM QUAD 
UTF-8: e2 80 81   UTF-16BE: 2001   Decimal:  
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: 2003
U+2002 EN SPACE 
UTF-8: e2 80 82   UTF-16BE: 2002   Decimal:  
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020
U+2003 EM SPACE 
UTF-8: e2 80 83   UTF-16BE: 2003   Decimal:  
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020
U+2004 THREE-PER-EM SPACE 
UTF-8: e2 80 84   UTF-16BE: 2004   Decimal:  
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020
U+2005 FOUR-PER-EM SPACE 
UTF-8: e2 80 85   UTF-16BE: 2005   Decimal:  
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020
U+2006 SIX-PER-EM SPACE 
UTF-8: e2 80 86   UTF-16BE: 2006   Decimal:  
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020
U+2007 FIGURE SPACE 
UTF-8: e2 80 87   UTF-16BE: 2007   Decimal:  
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <noBreak> 0020
U+2008 PUNCTUATION SPACE 
UTF-8: e2 80 88   UTF-16BE: 2008   Decimal:  
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020
U+2009 THIN SPACE 
UTF-8: e2 80 89   UTF-16BE: 2009   Decimal:  
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020
U+200A HAIR SPACE 
UTF-8: e2 80 8a   UTF-16BE: 200a   Decimal:  
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020
U+200B ZERO WIDTH SPACE 
UTF-8: e2 80 8b   UTF-16BE: 200b   Decimal: 
Category: Zs (Separator, Space)
Bidi: BN (Boundary Neutral)
U+202F NARROW NO-BREAK SPACE 
UTF-8: e2 80 af   UTF-16BE: 202f   Decimal:  
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <noBreak> 0020
U+205F MEDIUM MATHEMATICAL SPACE 
UTF-8: e2 81 9f   UTF-16BE: 205f   Decimal:  
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <compat> 0020
U+3000 IDEOGRAPHIC SPACE 
UTF-8: e3 80 80   UTF-16BE: 3000   Decimal:  
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)
Decomposition: <wide> 0020
    
    
More information about the bogofilter
mailing list