Clean the database from non-spam mails?

Chris Wilkes cwilkes-bf at ladro.com
Tue Dec 2 19:56:53 CET 2003


On Tue, Dec 02, 2003 at 07:14:10PM +0100, Johannes Klug wrote:
> 
> I'd like to remove all non-spam emails from my database. I
> trained bogofilter only with about 200 ham emails, now my ham-box
> is about 700.

I did a little experiment with removing all words whose ham counts were
higher than their spam counts, by filtering the output of bogoutil -d,
which is:
  1  word
  2  spam count
  3  ham  count
  4  date updated

  # bogoutil -d ./wordlist.db | awk '$2 > $3 {print}' | bogoutil -l ./new.db

However this causes all my emails that registered with a spamicity of
1.0000 to fall to 0.41.  Just looking for one obvious spam word
'enlarge' in an email:

  # bogoutil -d /tmp/orig/wordlist.db |  \
    awk '$2 > $3 {print}' | bogoutil -l /tmp/onlyspam/wordlist.db
  
  # bogoutil -w /tmp/orig/wordlist.db enlarge
                                 spam   good
                     enlarge       87      0

  # bogoutil -w /tmp/onlyspam/wordlist.db enlarge
                                 spam   good
                     enlarge       87      0

An email with 'enlarge' in it ($s = email message file):

  # bogofilter -vvv -d /tmp/orig/     -I $s | grep enlarge
    "enlarge"         87  0.000000  0.021712  0.999933 +

  # bogofilter -vvv -d /tmp/onlyspam/ -I $s | grep enlarge
    "enlarge"          0  0.000000  0.000000  0.415000 -

Doesn't seem to pick it up now.  Did I screw up something with creating
the new wordlist.db file?  The spam score of that email went from 1.0000
to 0.415000.  I'm running version 0.15.9.

Chris




More information about the Bogofilter mailing list