Problem compacting databases (again!)

Juan J. Martinez reidrac at blackshell.usebox.net
Sun Jan 23 22:20:29 CET 2005


Hello,

It happened again:

# bogoutil -d wordlist.db | bogoutil -l wordlist.db.new
# bogoutil: Unexpected input [d'informÃ] on line 25173. Expecting 
whitespace before count.

It's the same bug last time (the same word also!).

I did as David pointed:

# bogoutil -d wordlist.db > wordlist.txt
# head -25173 wordlist.txt | tail -1
d'informà tica 0 1 20050122

Yeah... it's a space in a bad place.

I think the word is "d'informàtica", and appears to be in utf-8. The 
system is stock OpenBSD, I don't know if this is related. That's 
bogofilter 0.92.8 (with BerkeleyDB 4.2.52).

I remember a post in a mail list with the charset unset or wrong set, 
but I cannot find such message in the archives.

You can download the wordlist.db and a gzipped wordlist.txt at:
http://blackshell.usebox.net/bogofilter/

Lemme know if you need more info.

Regards,

Juanjo

-- 
Desarrollo y Sistemas: http://usebox.net/
       Página personal: http://usebox.net/jjm/



More information about the Bogofilter mailing list