problem in training
David Relson
relson at osagesoftware.com
Wed Sep 29 00:23:31 CEST 2004
On Tue, 28 Sep 2004 19:43:55 +0000
Yair Zohar wrote:
> I've tried these:
> #> ./randomtrain -s ~/junk/Maildir/tmp/ -n ~/junk/Maildir/cur/
Hi Yair,
When first training bogofilter, randomtrain is the wrong thing to do.
Just use bogofilter's -s and -n options, i.e.
bogofilter -v -s -B ~/junk/Maildir/tmp
bogofilter -v -n -B ~/junk/Maildir/cur
The "-v" tells bogofilter to print some info on what it's done; "-s" is
"register as spam" and "-n" is "register as non-spam"; "-B" tells it to
read messages from the following file (or directory).
As you discovered, "cat 'unwanted mail ...' | bogofilter -vvv" isn't
quite right. Using echo works better, i.e.
echo this is a test | bogofilter -vvv
gives me:
X-Bogosity: No, tests=bogofilter, spamicity=0.520000, version=0.92.6
n pgood pbad fw U
"head:test" 149 0.000737 0.001426 0.659387 -
"head:this" 3133 0.014785 0.030806 0.675710 -
N_P_Q_S_s_x_md 0 0.000000 0.000000 0.520000
0.017800 0.520000 0.375000
Of course the numbers you get will be different since your wordlist is
different. Also, you may notice that each token has a prefix of
"head:". This happens because bogofilter adds prefix tags to the
various header lines of a message. This can be turned off with the "-H"
flag, i.e.
[relson at osage cvs]$ echo this is a test | bogofilter -vvv -H
X-Bogosity: No, tests=bogofilter, spamicity=0.105823, version=0.92.6
n pgood pbad fw U
"test" 5740 0.068543 0.008112 0.105823 +
"this" 74773 0.473252 0.594872 0.556932 -
N_P_Q_S_s_x_md 1 0.894177 0.105823 0.105823
0.017800 0.520000 0.375000
HTH,
David
More information about the Bogofilter
mailing list