importing words from popfile in to bogofilter

David Relson relson at osagesoftware.com
Sat Sep 3 23:11:28 CEST 2011


On Sat, 3 Sep 2011 06:47:14 -0700 (PDT)
Joseph Harth wrote:

> Thanks David, What I did and i dont know if it worked was. I copied
> all the words into a message inside a mbox file and the loaded them
> as spam/ ham respectively but this is probably wrong?

Hi Joseph,

I think what you've done may be sub-optimal, possibly not even useful.

Part of the bayesian nature of bogofilter is to know how often words
appear in spam and ham.  In particular, bogofilter likes to know that
"xxx" appears in y% of spam messages and in z% of ham messages.  With
these numbers, the appearance of "xxx" can be judged as good or bad.

Consider the following

   .MSG_COUNT 1000 100
   xxx 500 90

These values indicate that 
 1000 spam have been processed and 500 of them had xxx, for a 50% score
  100 ham with 90 having xxx, for a score of 90%.

With numbers like the above, the appearance of xxx indicates the
message is more likely good than bad.

With the wordlist you've created, run "bogoutil -d wordlist.db" to
display your wordlist as text and see if you like the results.  

Alternatively you can test simple messages using "echo" or file
indirection, for example

   echo this is a test | bogofilter -H -v

   bogofilter -v < test_message.txt

Of course, you could just nuke the wordlist you've created and
train bogofilter with a bunch of saved ham and spam messages.
That would ensure that bogofilter's wordlist is structured the
way bogofilter expects it to be.

Regards,

David



More information about the Bogofilter mailing list