[bogofilter] Training from scratch
David Relson
relson at osagesoftware.com
Tue Mar 8 01:15:35 CET 2005
On Mon, 07 Mar 2005 23:30:42 +1000
Mark Constable wrote:
> David Relson wrote:
> >>. xfilter "/usr/bin/bogofilter -u -e -p -d $HOME/.bogofilter"
> >>. bogofilter -d${MPATH}.bogofilter -Ns < "$i"
> >>. bogofilter -d${MPATH}.bogofilter -Sn < "$i"
>
> > *** Also you can use "bogofilter -vvv <msg" to see the spam and ham
> > counts and the spamicity scores for each token in the message. The "-
> > vvv" output is described in the FAQ.
>
> Thank you. These are FAQ answers but I wasn't sure which answers
> to apply in this case. There is something screwy with the
> initial input, just now I tracked another test message as before
> and it arrived in the Unsure folder with the same result as before,
>
> . Mar 7 23:01:40 mail bogofilter[3721]: X-Bogosity: Unsure, \
> . spamicity=0.520000, version=0.93.5
The 0.52 value is the default score for unknown words. It's also the
score given to a message composed totally of uknown words. My nickel
says that bogofilter doesn't have the proper path for the wordlist.
What's the value of ${MPATH}?
Is the proper path ${MPATH}.bogofilter (which looks suspect, though it
may be fine) or ${MPATH}/.bogofilter?
> however, when I recheck the actual message in that Unsure folder
> immediately after it arrives I get...
>
> # bogofilter -vv < long_courier_msg_name
> X-Bogosity: Ham, tests=bogofilter, spamicity=0.107097, version=0.93.5
> int cnt prob spamicity histogram
> 0.00 19 0.016756 0.013088 ###################
> 0.10 3 0.115524 0.027034 ###
> 0.20 0 0.000000 0.027034
> 0.30 0 0.000000 0.027034
> 0.40 0 0.000000 0.027034
> 0.50 0 0.000000 0.027034
> 0.60 0 0.000000 0.027034
> 0.70 0 0.000000 0.027034
> 0.80 0 0.000000 0.027034
> 0.90 3 0.996235 0.363943 ###
>
> which is more like what I would expect. There is something screwy
> about how I am initially feeding the message into bogfilter...
> (see below)... here is the beginning of my maildroprc and the log
> entry for the above message in case anyone can see an obvious booboo.
The above looks good (as you know). Adding flags "-U -vv" to the
xfilter line would include scoring info in the received messages..
> # head /etc/courier/maildroprc
> logfile "/var/log/bogofilter.log"
> log $HOME
> xfilter "/usr/bin/bogofilter -l -u -e -p -c /etc/bogofilter.cf -d $HOME/.bogofilter"
>
> if ((/^X-Bogosity: Spam/:h))
> {
> `test -d $HOME/Maildir/.Trash`
> if ($RETURNCODE)
> {
> /^Subject: *!.*/:h
> ...
> ---------------------------------------
> /home/cherry/
> Date: Mon Mar 7 23:01:40 2005
> From: Mark Constable <markc at renta.net>
> Subj: More testing
> File: /home/cherry//Maildir/.Suspect (1159)
>
> --markc
That looks reasonable. Adding flags "-x d -v" to your bogofilter
command will add some database debugging info (on stderr). I'm not
sure if it'll end up in /var/log/messages or /var/log/syslog, though I
think it'll end in one or the other.
Looking forward to the next installment of this saga :-)
Regards,
David
_______________________________________________
Bogofilter mailing list
Bogofilter at bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter
More information about the Bogofilter
mailing list