problem in training

David Relson relson at osagesoftware.com
Wed Sep 29 00:23:31 CEST 2004


On Tue, 28 Sep 2004 19:43:55 +0000
Yair Zohar wrote:

> I've tried these:
> #> ./randomtrain -s ~/junk/Maildir/tmp/ -n ~/junk/Maildir/cur/

Hi Yair,

When first training bogofilter, randomtrain is the wrong thing to do. 
Just use bogofilter's -s and -n options, i.e.

bogofilter -v -s -B ~/junk/Maildir/tmp
bogofilter -v -n -B ~/junk/Maildir/cur

The "-v" tells bogofilter to print some info on what it's done; "-s" is
"register as spam" and "-n" is "register as non-spam"; "-B" tells it to
read messages from the following file (or directory).

As you discovered, "cat 'unwanted mail ...'  | bogofilter -vvv" isn't
quite right.  Using echo works better, i.e.

echo this is a test | bogofilter -vvv

gives me:

X-Bogosity: No, tests=bogofilter, spamicity=0.520000, version=0.92.6
                                      n    pgood     pbad      fw     U
"head:test"                         149  0.000737  0.001426  0.659387 -
"head:this"                        3133  0.014785  0.030806  0.675710 -
N_P_Q_S_s_x_md                        0  0.000000  0.000000  0.520000
                                         0.017800  0.520000  0.375000

Of course the numbers you get will be different since your wordlist is
different.  Also, you may notice that each token has a prefix of
"head:".  This happens because bogofilter adds prefix tags to the
various header lines of a message.  This can be turned off with the "-H"
flag, i.e.

[relson at osage cvs]$ echo this is a test | bogofilter -vvv -H

X-Bogosity: No, tests=bogofilter, spamicity=0.105823, version=0.92.6
                                      n    pgood     pbad      fw     U
"test"                             5740  0.068543  0.008112  0.105823 +
"this"                            74773  0.473252  0.594872  0.556932 -
N_P_Q_S_s_x_md                        1  0.894177  0.105823  0.105823
                                         0.017800  0.520000  0.375000


HTH,

David



More information about the Bogofilter mailing list