OpenBSD 3.4 isspace() b0rked (was: Problem compacting databases (again!))
Otto Moerbeek
otto at drijf.net
Mon Jan 24 10:12:58 CET 2005
On Sun, 23 Jan 2005, Matthias Andree wrote:
> Matthias Andree <matthias.andree at gmx.de> writes:
>
> > That has no relevance. isspace() is only valid for EOF and the values
> > that can be represented in an "unsigned char". 0xffffffe0 does not fall
> > into this category, hence the isspace() behavior is undefined.9
>
> Bingo - I tried this on SourceForge's compile farm. The second line
> should have been 0xa0 0 (zero as the 2nd number) according to POSIX and
> OpenBSD documentation, but we got 8 (eight) instead.
>
> x86-openbsd1:~$ ./try
> 0xa0 8
> x86-openbsd1:~$ uname -a
> OpenBSD x86-openbsd1.cf.sourceforge.net 3.4 GENERIC#18 i386
> x86-openbsd1:~$ shar try.c
> # This is a shell archive. Save it in a file, remove anything before
> # this line, and then unpack it by entering "sh file". Note, it may
> # create directories; files and directories will be owned by you and
> # have default permissions.
> #
> # This archive contains:
> #
> # try.c
> #
> echo x - try.c
> sed 's/^X//' >try.c << 'END-of-try.c'
> X#include <stdio.h>
> X#include <ctype.h>
> X
> Xint main(int argc, char **argv)
> X{
> X unsigned char x = 0xa0;
> X printf("0x%02x %d\n", x, isspace(x));
> X return (0);
> X}
> END-of-try.c
> exit
>
> OpenBSD claims compliance, in the isspace(3) manual page.
>
> The isspace() function tests for the standard whitespace characters for
> which isalnum(3) is false. The standard whitespace characters are the
> following:
>
> ` ' Space character.
> \f Form feed.
> \n New-line.
> \r Carriage return.
> \t Horizontal tab.
> \v And vertical tab.
>
> In the C locale, isspace() returns true only for the standard whitespace
> characters.
>
> But fails to meets its own specification. Cc'd to bugs at openbsd.org
>
> Note that bogofilter does not switch locale, so it is in the default C
> or POSIX locale (this is synonymous).
The code determines isspace() assuming ISO8859. 0xa0 is the non-breaking
space char there. Looking at Posix
(http://www.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap07.html):
===========================================================================
space
Define characters to be classified as white-space characters.
In the POSIX locale, at a minimum, the <space>, <form-feed>, <newline>,
<carriage-return>, <tab>, and <vertical-tab> shall be included.
===========================================================================
So extension is allowed in the Posix locale. Seems the man page is not
right, and the 'only' word has to be scrapped.
-Otto
More information about the Bogofilter
mailing list