OpenBSD 3.4 isspace() b0rked (was: Problem compacting databases (again!))

Otto Moerbeek otto at drijf.net
Mon Jan 24 10:12:58 CET 2005


On Sun, 23 Jan 2005, Matthias Andree wrote:

> Matthias Andree <matthias.andree at gmx.de> writes:
> 
> > That has no relevance. isspace() is only valid for EOF and the values
> > that can be represented in an "unsigned char". 0xffffffe0 does not fall
> > into this category, hence the isspace() behavior is undefined.9
> 
> Bingo - I tried this on SourceForge's compile farm. The second line
> should have been 0xa0 0 (zero as the 2nd number) according to POSIX and
> OpenBSD documentation, but we got 8 (eight) instead.
> 
> x86-openbsd1:~$ ./try
> 0xa0 8
> x86-openbsd1:~$ uname -a
> OpenBSD x86-openbsd1.cf.sourceforge.net 3.4 GENERIC#18 i386
> x86-openbsd1:~$ shar try.c
> # This is a shell archive.  Save it in a file, remove anything before
> # this line, and then unpack it by entering "sh file".  Note, it may
> # create directories; files and directories will be owned by you and
> # have default permissions.
> #
> # This archive contains:
> #
> #       try.c
> #
> echo x - try.c
> sed 's/^X//' >try.c << 'END-of-try.c'
> X#include <stdio.h>
> X#include <ctype.h>
> X
> Xint main(int argc, char **argv)
> X{
> X        unsigned char x = 0xa0;
> X        printf("0x%02x %d\n", x, isspace(x));
> X        return (0);
> X}
> END-of-try.c
> exit
> 
> OpenBSD claims compliance, in the isspace(3) manual page.
> 
>      The isspace() function tests for the standard whitespace characters for
>      which isalnum(3) is false.  The standard whitespace characters are the
>      following:
> 
>            ` '    Space character.
>            \f     Form feed.
>            \n     New-line.
>            \r     Carriage return.
>            \t     Horizontal tab.
>            \v     And vertical tab.
> 
>      In the C locale, isspace() returns true only for the standard whitespace
>      characters.
> 
> But fails to meets its own specification. Cc'd to bugs at openbsd.org
> 
> Note that bogofilter does not switch locale, so it is in the default C
> or POSIX locale (this is synonymous).

The code determines isspace() assuming ISO8859. 0xa0 is the non-breaking 
space char there. Looking at Posix 
(http://www.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap07.html): 
=========================================================================== 
space

Define characters to be classified as white-space characters.

In the POSIX locale, at a minimum, the <space>, <form-feed>, <newline>, 
<carriage-return>, <tab>, and <vertical-tab> shall be included.
===========================================================================

So extension is allowed in the Posix locale. Seems the man page is not 
right, and the 'only' word has to be scrapped.

	-Otto





More information about the Bogofilter mailing list