POSIX_space

David Relson relson at osagesoftware.com
Mon Jan 24 01:29:17 CET 2005


Matthias,

I like your solution!  Having bogofilter explicitly include the POSIX
definition of whitespace is an excellent approach.

When I looked at Juan's file, there were two lines that appeared to have
embedded spaces, i.e.

  64 27 69 6E 66 6F 72 6D  C3 A0 74 69 63 61 20 30  d'inform  tica 0
  20 31 20 32 30 30 35 30  31 32 32                  1 20050122

  64 27 69 6E 66 6F 72 6D  E0 74 69 63 61 20 30 20  d'inform tica 0 
  31 20 32 30 30 35 30 31  31 39                    1 20050119

The 0xA0 vs 0xE0 confusion occurred because I accidentally looked at the
second line. 

To answer Clint's question, in the parsing process, bogofilter _does_
convert some characters.  See functions map_us_ascii() and
map_iso_8859_15() and associated tables xlate_us[] and xlate_15[].  The
actual translation is done in function yyinput.  One reason for doing
this was concern that different implementations used different character
sets so that the meaning of [:blank:] of [:cntrl:] varies among
platforms.

Anyhow, Matthias' addition of POSIX_space, is platform independent.

Regards,

David



More information about the Bogofilter mailing list