bogotule [was: PATCH ... the evasive message discussed on the list]

David Relson relson at osagesoftware.com
Fri Nov 7 18:19:41 CET 2003


On Fri, 7 Nov 2003 10:37:49 -0600
John McCain <jmccain at layer3al.com> wrote:

> On Thursday 06 November 2003 10:31 pm, David Relson wrote:
> 
> > Also, for those inclined to experiment, the attached patch will
> > recognize some forms of the DOCTYPE line and switch to HTML mode. 
> > If there's merit to it, perhaps it should become official.
> 
> Well, the results are in on the lexer patch.  I've included the
> bogotune output so that my analysis can be verified, but by the
> numbers, it seems to make bogofilter considerably less accurate.  I'll
> take my crow with hot sauce, please.

I love hot sauce!  It adds spice to life :-)

Your bogotune results are ... interesting ...  

The top 10 from the fine scan look good as does: 

	Minimum found at s 0.1000, md 0.465, x 0.660
	        fp 6 (1.1834%), fn 15 (1.8821%)

Given that, 

	spam_cutoff=0.993 # for 0.79% false positives; expect 17.82% false neg.

The difference between 17.82% false negative and 1.8821% false negative
doesn't make sense.  I need to dig into the code.

Can you run the attached script (mccain.sh) and send me the output file?
 It'll dump your wordlist and convert your test mailboxes into msg-count
format (using bogolex.sh, which is in your bogofilter/src directory). 
bogolex.sh parses a message and outputs the tokens with their spam and
ham counts.  Since the output is alphabetized and duplicates are
removed, the message is effectively scrambled and is unintelligible.

The output of mccain.sh will allow me to run bogotune using _your_
wordlists and messages.

By the way, if you can use a larger set of messages for running
bogotune, it would probably be a good thing.

David
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mccain.sh
Type: application/x-sh
Size: 386 bytes
Desc: not available
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20031107/2aaf2d15/attachment.sh>


More information about the Bogofilter mailing list