importing words from popfile in to bogofilter
David Relson
relson at osagesoftware.com
Sat Sep 3 23:11:28 CEST 2011
On Sat, 3 Sep 2011 06:47:14 -0700 (PDT)
Joseph Harth wrote:
> Thanks David, What I did and i dont know if it worked was. I copied
> all the words into a message inside a mbox file and the loaded them
> as spam/ ham respectively but this is probably wrong?
Hi Joseph,
I think what you've done may be sub-optimal, possibly not even useful.
Part of the bayesian nature of bogofilter is to know how often words
appear in spam and ham. In particular, bogofilter likes to know that
"xxx" appears in y% of spam messages and in z% of ham messages. With
these numbers, the appearance of "xxx" can be judged as good or bad.
Consider the following
.MSG_COUNT 1000 100
xxx 500 90
These values indicate that
1000 spam have been processed and 500 of them had xxx, for a 50% score
100 ham with 90 having xxx, for a score of 90%.
With numbers like the above, the appearance of xxx indicates the
message is more likely good than bad.
With the wordlist you've created, run "bogoutil -d wordlist.db" to
display your wordlist as text and see if you like the results.
Alternatively you can test simple messages using "echo" or file
indirection, for example
echo this is a test | bogofilter -H -v
bogofilter -v < test_message.txt
Of course, you could just nuke the wordlist you've created and
train bogofilter with a bunch of saved ham and spam messages.
That would ensure that bogofilter's wordlist is structured the
way bogofilter expects it to be.
Regards,
David
More information about the Bogofilter
mailing list