bogoutil -s -m changes wordlist encoding?

Jason Lunz lunz at falooley.org
Sat Dec 31 17:16:33 CET 2005


I'm looking at the effect of bogoutil -m on my wordlist by running it on
a copy, then comparing the output of "bogoutil -d" from the before and
after wordlists. When I tried "bogoutil -s 1,40 -m new/wordlist.db", the
diff showed not only that the short/long tokens had been removed, but
also that non-ascii tokens were changed.

For example, here's the first few hunks of the diff of the "bogoutil -d"
output for the before and after wordlists:

@@ -618,6 +618,7 @@
 $990 2 0 20050328
 $995 0 2 20041202
 $999 6 2 20050210
+.ENCODING 0 0
 .MSG_COUNT 10992 4657 20051231
 02.23.44.09 0 2 20041202
 02.41.22.65 0 4 20041202
@@ -1039,7 +1040,7 @@
 AIS 4 0 20041202
 AIX 0 3 20041202
 AJC 0 2 20041202
-AKGEYÃ~]K 7 0 20050318
+AKGEYÃ~CÂ~]K 7 0 20050318
 AKUEZE 3 0 20050121
 AL1S 2 0 20050205
 ALA 0 6 20041202
@@ -1092,7 +1093,7 @@
 AMBlEN 4 0 20041202
 AMC 0 3 20041202
 AMD 13 16 20050309
-AMD¡ 2 0 20050309
+AMDÃ~B¡ 2 0 20050309
 AMEP 2 0 20041202
 AMEP.OB 2 0 20041202
 AMERICA 8 1 20050210

Is this expected? It looks to me like the tokens have been corrupted
in the after wordlist.

Jason




More information about the Bogofilter mailing list