garbage removal
Barry Gould
BarryGould at PennySaverUSA.net
Thu May 8 20:21:32 CEST 2003
Due to the large size (22MB & 5MB) of my good & spam db's, I decided to try
dropping all the words with count=1 as previously suggested.
However, on the spam db, I get:
# bogoutil -d spamlist.db | bogoutil -l spamlist.db.new -c 1
bogoutil: Unexpected input [sÛ] on line 4. Expecting whitespace before count
#
Those look like non-ascii characters.
Is there another command I can (should?) run to remove garbage like this
from the dbs?
bogofilter is 0.10.0
Thanks,
Barry
At 09:24 AM 4/16/2003, Alejandro Dau wrote:
>PS: To make a trimmed down db for the tests you can do:
>
>bogoutil -d /tmp/complete/goodlist.db | bogoutil -l
>/tmp/trimmed/goodlist.db.new -c 1
>mv /tmp/trimmed/goodlist.db.new /tmp/trimmed/goodlist.db
>bogoutil -d /tmp/complete/spamlist.db | bogoutil -l
>/tmp/trimmed/spamlist.db.new -c 1
>mv /tmp/trimmed/spamlist.db.new /tmp/trimmed/spamlist.db
>
>And then invoke bogofilter with options -d /tmp/complete or -d /tmp/trimmed
More information about the Bogofilter
mailing list