Clean the database from non-spam mails?

Chris Wilkes cwilkes-bf at ladro.com
Wed Dec 3 03:57:47 CET 2003


On Tue, Dec 02, 2003 at 07:26:43PM -0500, David Relson wrote:
> > 
> > The "enlarge" in my email message is in the body, not in the subject
> > line.
> > 
> > > To check the whole operation, I'd do something like:
> > > 
> > >     bogoutil -d /tmp/orig/wordlist.db | tee orig.tmp | wc -l
> > >     bogoutil -d /tmp/onlyspam/wordlist.db | tee spam.tmp | wc -l
> > >     diff orig.tmp spam.tmp
> > 
> Your output looks great -- except for the "enlarge ... 0.415000" from
> bogofilter.  The different word counts sounds good, as well.
> 
> Next to try is telling bogofilter to display what's happening in the
> datastore code.  Add flags "-v -x d" when you run it.

The "-x d" didn't show me anything.  A tarball of my text wordlist (its
a little smaller gzipped than the .db file) and the enlarge email is
available at:
  http://ladro.com/bf/bf-enlarge2.tar.gz

To show the problem:

 mkdir /tmp/orig /tmp/onlyspam
 cd /tmp/orig
 wget http://ladro.com/bf/bf-enlarge2.tar.gz
 tar -xvzf bf-enlarge2.tar.gz
 bogoutil -l ./wordlist.db < wordlist.txt
 awk '$2 > $3 {print}' wordlist.txt | bogoutil -l /tmp/onlyspam/wordlist.db
 bogofilter -v -x d -d /tmp/orig     < Enlargeemail.txt
 bogofilter -v -x d -d /tmp/onlyspam < Enlargeemail.txt

Chris




More information about the Bogofilter mailing list