[bogofilter] bogotune weirdness

David Relson relson at osagesoftware.com
Mon May 10 13:59:09 CEST 2004


On 10 May 2004 07:49:45 -0400
Tom Anderson wrote:

> On Sun, 2004-05-09 at 21:46, Tom Allison wrote:
> > db_cachesize=5
> > robx=0.600000
> > min_dev=0.465
> > robs=0.0316
> > spam_cutoff=0.081       # for 0.05% fpos (3); expect 0.02% fneg (1).
> > #spam_cutoff=0.017      # for 0.10% fpos (7); expect 0.02% fneg (1).
> > #spam_cutoff=0.001      # for 0.20% fpos (14); expect 0.02% fneg
> > (1). ham_cutoff=0.450
> > 
> > 
> > I just tried this on some of my sample files.
> > 
> > Can someone explain how you can get spam_cutoff < ham_cutoff and
> > what happens to the scoring when both conditions X<ham_cutoff and 
> > X>spam_cutoff are met.
> 
> Your robx and min_dev would make it so that it couldn't happen to
> individual tokens.  Anything in the range between spam_cutoff and
> ham_cutoff is ignored due to your min_dev.  First seen tokens are
> scored above the ham_cutoff.  And if the whole message scores in that
> range... well then I don't know about that... unsure maybe?  
> 
> In any event, these values are simply unreasonable.  I'd pick values
> that make sense according to theory.  If reality doesn't match theory,
> then something is wrong with the programming, as it doesn't fit the
> spec.
> 
> Tom

Hi Toms,

The very low spam cutoffs suggest that the wordlist has been given bad
information, i.e. spam registered as ham or ham registered as spam.
Bogotune is supposed to warn if there are too many low scoring spam, but
I don't see that in the output.

I'll follow up on this later today.  Gotta do some work work.

David



More information about the Bogofilter mailing list