Clean the database from non-spam mails?
Chris Wilkes
cwilkes-bf at ladro.com
Wed Dec 3 03:57:47 CET 2003
On Tue, Dec 02, 2003 at 07:26:43PM -0500, David Relson wrote:
> >
> > The "enlarge" in my email message is in the body, not in the subject
> > line.
> >
> > > To check the whole operation, I'd do something like:
> > >
> > > bogoutil -d /tmp/orig/wordlist.db | tee orig.tmp | wc -l
> > > bogoutil -d /tmp/onlyspam/wordlist.db | tee spam.tmp | wc -l
> > > diff orig.tmp spam.tmp
> >
> Your output looks great -- except for the "enlarge ... 0.415000" from
> bogofilter. The different word counts sounds good, as well.
>
> Next to try is telling bogofilter to display what's happening in the
> datastore code. Add flags "-v -x d" when you run it.
The "-x d" didn't show me anything. A tarball of my text wordlist (its
a little smaller gzipped than the .db file) and the enlarge email is
available at:
http://ladro.com/bf/bf-enlarge2.tar.gz
To show the problem:
mkdir /tmp/orig /tmp/onlyspam
cd /tmp/orig
wget http://ladro.com/bf/bf-enlarge2.tar.gz
tar -xvzf bf-enlarge2.tar.gz
bogoutil -l ./wordlist.db < wordlist.txt
awk '$2 > $3 {print}' wordlist.txt | bogoutil -l /tmp/onlyspam/wordlist.db
bogofilter -v -x d -d /tmp/orig < Enlargeemail.txt
bogofilter -v -x d -d /tmp/onlyspam < Enlargeemail.txt
Chris
More information about the Bogofilter
mailing list