[bogofilter] Training from scratch

David Relson relson at osagesoftware.com
Tue Mar 8 01:15:35 CET 2005


On Mon, 07 Mar 2005 23:30:42 +1000
Mark Constable wrote:

> David Relson wrote:
> >>. xfilter "/usr/bin/bogofilter -u -e -p -d $HOME/.bogofilter"
> >>. bogofilter -d${MPATH}.bogofilter -Ns < "$i"
> >>. bogofilter -d${MPATH}.bogofilter -Sn < "$i"
> 
> > *** Also you can use "bogofilter -vvv <msg" to see the spam and ham
> > counts and the spamicity scores for each token in the message.  The "-
> > vvv" output is described in the FAQ.
> 
> Thank you. These are FAQ answers but I wasn't sure which answers
> to apply in this case. There is something screwy with the 
> initial input, just now I tracked another test message as before
> and it arrived in the Unsure folder with the same result as before,
>  
> . Mar  7 23:01:40 mail bogofilter[3721]: X-Bogosity: Unsure, \
> .  spamicity=0.520000, version=0.93.5

The 0.52 value is the default score for unknown words.  It's also the
score given to a message composed totally of uknown words.  My nickel
says that bogofilter doesn't have the proper path for the wordlist.

What's the value of ${MPATH}?  

Is the proper path ${MPATH}.bogofilter (which looks suspect, though it
may be fine) or ${MPATH}/.bogofilter?

 
> however, when I recheck the actual message in that Unsure folder 
> immediately after it arrives I get...
> 
> # bogofilter -vv < long_courier_msg_name
> X-Bogosity: Ham, tests=bogofilter, spamicity=0.107097, version=0.93.5
>    int  cnt   prob  spamicity histogram
>   0.00   19 0.016756 0.013088 ###################
>   0.10    3 0.115524 0.027034 ###
>   0.20    0 0.000000 0.027034
>   0.30    0 0.000000 0.027034
>   0.40    0 0.000000 0.027034
>   0.50    0 0.000000 0.027034
>   0.60    0 0.000000 0.027034
>   0.70    0 0.000000 0.027034
>   0.80    0 0.000000 0.027034
>   0.90    3 0.996235 0.363943 ###
> 
> which is more like what I would expect. There is something screwy
> about how I am initially feeding the message into bogfilter... 
> (see below)... here is the beginning of my maildroprc and the log
> entry for the above message in case anyone can see an obvious booboo.

The above looks good (as you know).  Adding flags "-U -vv" to the
xfilter line would include scoring info in the received messages..

 
> # head /etc/courier/maildroprc
> logfile "/var/log/bogofilter.log"
> log $HOME
> xfilter "/usr/bin/bogofilter -l -u -e -p -c /etc/bogofilter.cf -d $HOME/.bogofilter"
> 
> if ((/^X-Bogosity: Spam/:h))
> {
> `test -d $HOME/Maildir/.Trash`
> if ($RETURNCODE)
> {
>      /^Subject: *!.*/:h
> ...
> ---------------------------------------
> /home/cherry/
> Date: Mon Mar  7 23:01:40 2005
> From: Mark Constable <markc at renta.net>
> Subj: More testing
> File: /home/cherry//Maildir/.Suspect          (1159)
> 
> --markc

That looks reasonable.  Adding flags "-x d -v" to your bogofilter
command will add some database debugging info (on stderr).  I'm not
sure if it'll end up in /var/log/messages or /var/log/syslog, though I
think it'll end in one or the other.

Looking forward to the next installment of this saga :-)

Regards,

David

_______________________________________________
Bogofilter mailing list
Bogofilter at bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter



More information about the Bogofilter mailing list