All mails have spamicity=0.5200000

Robin Bowes robin-lists at robinbowes.com
Tue Dec 13 12:21:24 CET 2005


David Relson said the following on 13/12/2005 00:16:
> On Mon, 12 Dec 2005 14:57:22 +0000
> Robin Bowes wrote:
> 
> 
>>Hi,
>>
>>I've recently (6th Dec) moved to bogofilter 1.0.0 and ended up starting
>>again with an empty wordlist.
>>
>>I'm running bogofilter from maildrop like this:
>>
>>BOGOFILTER="/usr/bin/bogofilter"
>>BOGOARGS="-e -p -u -d"
>>BOGODIR=/path/to/home/dir/.bogofilter
>>...
>>xfilter "${BOGOFILTER} ${BOGOARGS} ${BOGODIR}"
>>
>>Since then, I've accepted around 3000 msgs. I'm manually training by
>>dropping spam into a Spam/Undetected folder and processing this from a
>>cron job using the following command:
>>
>> $BOGOFILTER -Ns -d $BOGODIR < $message
>>
>>However, all msgs are still only getting a spamicity rating of 0.520000.
>>
>>bogoutil -H wordlist.sb shows this:
> 
> 
> ...[snip]....
> 
> 
>>It looks to me like something's not quite right.
> 
> 
> Hi Robin,
> 
> Bogofilter auto-updates with messages that it thinks to be spam or
> ham.  A score of 0.520000 indicates that bogofilter is discarding all
> the tokens of the message, hence is left with the default score (0.52).
> 
> Probably your problem is that with _no_ initial training, bogofilter is
> defaulting on all tokens, hence is adding _no_ tokens to the wordlist.
> It's necessary to give it at least 1 message so that it can start
> judging.  With this minimalist approach, the results will be pretty bad.
> However, since your plan is to correct the (numerous) mistakes, you
> should be OK -- after a while.
> 
> You can see how it's scoring a particular message by using "-vvv", as
> in:
> 
>   bogofilter -vvv < msg
> 
> This will display all the tokens and their scores.  From that info
> you'll be able to see what tokens were in the message and how
> bogofilter scores each of them.  More detail on this output format is
> in the FAQ.

David,

Thanks for the response.

I think the problem was that I was using tri-state mode. This meant that
all messages were initial labelled "unsure" and not added to the
wordlist at all.

I've since added "-o ,0" so everything that is not positively identified
as spam is marked as ham. Any spam not picked up is dropped into my
Spam/Undetected folder and re-classified.

It's now working fine but, as you say, it takes some time to get the
wordlist trained again!

Thanks,

R.




More information about the Bogofilter mailing list