Problem compacting databases (again!)

Juan J. Martinez reidrac at blackshell.usebox.net
Sun Jan 23 23:13:47 CET 2005


En 23/01/05 22:59, David Relson escribía:
> On Sun, 23 Jan 2005 22:20:29 +0100
> Juan J. Martinez wrote:
>>It happened again:
>>
>># bogoutil -d wordlist.db | bogoutil -l wordlist.db.new
>># bogoutil: Unexpected input [d'informÃ] on line 25173. Expecting 
>>whitespace before count.
>>
[...]

> It looks like there's an 0xE0 character in that position.
> 
> #include <stdio.h>
> #include <ctype.h>
> 
> int main(int argc, char **argv)
> {
>     char x = 0xE0;
>     printf ("0x%02x %d\n", x, isspace(x));
>     return (0);
> }
> 
> 
> Can you compile and run this program?  The output I get is
> 
> 0xFFFFFFE0 0
> 
> I bet you get 0xFFFFFFE0 1 (or something similar).

$ ./main
0xffffffe0 0

> If I'm right, then bogoutil needs a more thorough check than isspace()
> because OpenBSD is doing something unusual.

I don't know... looking into words.txt I see:
d'informació 0 1 20050120
d'informà 0 3 20050120
d'informà tica 0 1 20050122
d'informàtica 0 1 20050119
d'instal·ladors 0 1 20050119

I'm not sure "d'informÃ" it's a real word but part of "d'informà tica".
Do you mean this is bogoutil bug? May be it's handling in wrong way a
unicode string? Seems isspace is working right...

Juanjo

PS: resend... next time I'll try to remember hit 'reply all'. Would be 
the admin of the list so kind and ignore the other mail from 
@usebox.net? Sorry :(

-- 
Desarrollo y Sistemas: http://usebox.net/
       Página personal: http://usebox.net/jjm/



More information about the Bogofilter mailing list