garbage removal

Barry Gould BarryGould at PennySaverUSA.net
Thu May 8 20:21:32 CEST 2003


Due to the large size (22MB & 5MB) of my good & spam db's, I decided to try 
dropping all the words with count=1 as previously suggested.

However, on the spam db, I get:
# bogoutil -d spamlist.db |  bogoutil -l spamlist.db.new -c 1
bogoutil: Unexpected input [sÛ‘Œ] on line 4. Expecting whitespace before count
#

Those look like non-ascii characters.

Is there another command I can (should?) run to remove garbage like this 
from the dbs?

bogofilter is 0.10.0

Thanks,
Barry

At 09:24 AM 4/16/2003, Alejandro Dau wrote:

>PS: To make a trimmed down db for the tests you can do:
>
>bogoutil -d /tmp/complete/goodlist.db |  bogoutil -l 
>/tmp/trimmed/goodlist.db.new -c 1
>mv /tmp/trimmed/goodlist.db.new /tmp/trimmed/goodlist.db
>bogoutil -d /tmp/complete/spamlist.db | bogoutil -l 
>/tmp/trimmed/spamlist.db.new -c 1
>mv /tmp/trimmed/spamlist.db.new /tmp/trimmed/spamlist.db
>
>And then invoke bogofilter with options -d /tmp/complete or -d /tmp/trimmed





More information about the Bogofilter mailing list