email containing token with high spamcount only gets an unsure

David Relson relson at osagesoftware.com
Thu Jul 20 13:39:12 CEST 2006


On Thu, 20 Jul 2006 10:31:12 +0200
Gerrit Thede wrote:

...[snip]....

> 
> Hi,
> thanks  for your answers. indeed the rest of these messages seem to
> balance the scoring. Here's a histogram for another one of these:
> 
> bogofilter -C -d ~/.bogofilter -vv < msg.txt
> X-Bogosity: Unsure, tests=bogofilter, spamicity=0.500000,
> version=1.0.3 int  cnt   prob  spamicity histogram
>   0.00   42 0.052268 0.030049
> ########################################## 0.10    7 0.111155
> 0.038573 ####### 0.20    0 0.000000 0.038573
>   0.30    0 0.000000 0.038573
>   0.40    0 0.000000 0.038573
>   0.50    0 0.000000 0.038573
>   0.60    0 0.000000 0.038573
>   0.70    0 0.000000 0.038573
>   0.80    0 0.000000 0.038573
>   0.90   47 0.995219
> 0.546718###############################################
> 
> 
> Maybe I just need more of these, but it's really annoying. I thought a
> message with the same ending over and over again must have been
> really easy to recognise as spam, but obviously it's not. bogofilter
> works perfectly for nearly all of the spam I get and I don't even
> have false positives. but these messages in particular seem to trick
> bogofilter really good.

With 49 low scoring tokens and 47 high scoring tokens, a net result of
0.5 seems reasonable.  You might want to look at tokens listed in
"-vvv" for more info to see what's there.

There is an area of weakness that I have noticed.  Mailing lists are
generally a source of ham and bogofilter learns that.  Some mailing
lists get spammed (once in a while).  When that happens, bogofilter
seens lots of hammish token (from the message headers) and lots of
spammish tokens (from the message body).  This can result in an
unsure.  This situation is a difficult one.  I've managed to deal with
it by creating an ignore wordlist with header tokens from the mailing
list -- which forces bogofilter to look at the message body.  Perhaps
doing this would help you.

Regards,

David



More information about the Bogofilter mailing list