Problem found (Was: Problem with randomtrain)

Jake Di Toro karrde+bogofilter at viluppo.net
Mon May 19 20:54:25 CEST 2003


On Fri, May 16, 2003 at 04:52:25PM -0400, Jake Di Toro wrote:
> 
> I seem to be having a problem getting randomtrain to work correctly.
> 
> I created a directory and word list with a small subset of my
> spam/ham, and then ran randomtrain against it.  After a third run it
> no longer had to train to "properly classify" the mail.  But when I
> ran one of the training mboxes through bogofilter to see what kind of
> scores came up, I recieved a mix of ham/unsures and spam/unsures.
> 
> some data follows, all commands run in ~/tmp/bogofilter:

[ .oO Data Snipped Oo. ]

Well looking into matters further I tracked down what the problem
seemed to be.  It seems that randomtrain does NOT use the config file,
except when it trains aginst an error.

The first call to bogofilter for each msg has '-c cfg.$pid' which is
only refenced in the cleanup code at the end of the script, the
command line parser defines only '$cfg' to be what the user specifed on
the command like or default to '-C'.  I double checked against 0.12.3
sources to make sure randomtrain had not been modified (it hadn't),
and have attached a patch to correct the call to bogofilter.

When I ran this agianst my mail sample it correctly trained untill
there were no errors or unsures (although that did take 3 runs to
completly clean).

-- 
Till Later, 
Jake <karrde at viluppo.net>
-------------- next part --------------
*** /a/viluppo/tensor/users1/home/karrde/bin/src/bogofilter-0.10.3.1/contrib/randomtrain	Fri Feb 14 19:27:58 2003
--- randomtrain	Mon May 19 14:26:39 2003
***************
*** 41,45 ****
  	let cnt=cnt+1
  	dd if=$fnam bs=1 skip=$offset count=$length 2>/dev/null >msg.$pid
! 	result=`bogofilter -t -v -d $bogodir -c cfg.$pid <msg.$pid | tr "SHU" "shu"`
  	got=`echo $result | awk '{print $1}' | tr "YNU" "snu"`
  	if [ "$expect" = "s" ]; then let nspam=$nspam+1
--- 41,45 ----
  	let cnt=cnt+1
  	dd if=$fnam bs=1 skip=$offset count=$length 2>/dev/null >msg.$pid
! 	result=`bogofilter -t -v -d $bogodir $cfg <msg.$pid | tr "SHU" "shu"`
  	got=`echo $result | awk '{print $1}' | tr "YNU" "snu"`
  	if [ "$expect" = "s" ]; then let nspam=$nspam+1



More information about the Bogofilter mailing list