Too many "unsures"
David Relson
relson at osagesoftware.com
Tue May 8 05:07:57 CEST 2007
On Mon, 7 May 2007 18:20:02 -0700
Kirrily Robert wrote:
> I installed bogofilter about a month ago and initially trained it on a
> hundred or so of spam and ham. I then set up a mail filter which puts
> spam in a junk folder and ham/unsure in my inbox. It also puts a copy
> of unsures into an "unsure" folder, so that I can go and make sure
> that I manually mark them as spam/ham. I do this via a mutt alias
> which calls bogofilter -n or bogofilter -s on each email.
>
> Problem is, after a month or so of training, I'm still getting
> hundreds of "unsures" a day. This morning's batch (164 messages) was
> about 20% spam but the rest ham, much of it from mailing list threads
> that have been talking about the same subject for days or weeks.
> Surely it should've figured out that that thread's ham by now?
>
> So I looked at the list archives and found this thread:
> http://www.bogofilter.org/pipermail/bogofilter/2007-March/009176.html
>
> I tried running bogoutil -p on my wordlist.db, but it hung, so I can't
> see whether my spam and ham were getting registered.
>
> Here's what -V tells me:
>
> bogofilter version 1.1.5
> Database: Sleepycat Software: Berkeley DB 3.2.9: (February 1,
> 2005) AUTO-XA
>
> Any suggestions?
>
> K.
Hello Kirrily,
From the info on your training regime I'm surprised to hear of your
hundreds of unsures. Have you looked at the X-Bogosity: lines for
them? Of particular value might be looking at the scores for messages
that are (1) spam and that are (2) ham. You might wish to change the
spam/ham/unsure boundaries in bogofilter's config file.
It can also be useful to look at bogofilter output using flags "-vv" or
"-vvv" to see what words in a message are causing the unsure result.
The FAQ has more info on these flags.
Bogoutil is working fine for me. It's man page describes how to use
the "-p" option properly.
How, exactly, did you run bogoutil?
For example, running
bogoutil -p /path/to/wordlist Kirrily
should show you how your name rates and
bogoutil -p /path/to/wordlist .MSG_COUNT
will show how many ham and spam have been used in building your
wordlist.
HTH,
David
More information about the Bogofilter
mailing list