Too many "unsures"

David Relson relson at osagesoftware.com
Tue May 8 05:07:57 CEST 2007


On Mon, 7 May 2007 18:20:02 -0700
Kirrily Robert wrote:

> I installed bogofilter about a month ago and initially trained it on a
> hundred or so of spam and ham.  I then set up a mail filter which puts
> spam in a junk folder and ham/unsure in my inbox.  It also puts a copy
> of unsures into an "unsure" folder, so that I can go and make sure
> that I manually mark them as spam/ham.  I do this via a mutt alias
> which calls bogofilter -n or bogofilter -s on each email.
> 
> Problem is, after a month or so of training, I'm still getting
> hundreds of "unsures" a day.  This morning's batch (164 messages) was
> about 20% spam but the rest ham, much of it from mailing list threads
> that have been talking about the same subject for days or weeks.
> Surely it should've figured out that that thread's ham by now?
> 
> So I looked at the list archives and found this thread:
> http://www.bogofilter.org/pipermail/bogofilter/2007-March/009176.html
> 
> I tried running bogoutil -p on my wordlist.db, but it hung, so I can't
> see whether my spam and ham were getting registered.
> 
> Here's what -V tells me:
> 
> bogofilter version 1.1.5
>     Database: Sleepycat Software: Berkeley DB 3.2.9: (February  1,
> 2005) AUTO-XA 
> 
> Any suggestions?
> 
> K.

Hello Kirrily,

From the info on your training regime I'm surprised to hear of your
hundreds of unsures.  Have you looked at the X-Bogosity: lines for
them?  Of particular value might be looking at the scores for messages
that are (1) spam and that are (2) ham.  You might wish to change the
spam/ham/unsure boundaries in bogofilter's config file.

It can also be useful to look at bogofilter output using flags "-vv" or
"-vvv" to see what words in a message are causing the unsure result.
The FAQ has more info on these flags.

Bogoutil is working fine for me.  It's man page describes how to use
the "-p" option properly.

How, exactly, did you run bogoutil?  

For example, running

  bogoutil -p /path/to/wordlist Kirrily

should show you how your name rates and

  bogoutil -p /path/to/wordlist .MSG_COUNT

will show how many ham and spam have been used in building your
wordlist.

HTH,

David



More information about the Bogofilter mailing list