the importance of robx

David Relson relson at osagesoftware.com
Sun Feb 29 14:17:18 CET 2004


On Sun, 29 Feb 2004 10:29:15 +0100
Boris 'pi' Piwinger wrote:

> David Relson <relson at osagesoftware.com> wrote:
> 
> >robx is the score for unknown words.  I've always thought of
> >"unknown" as being a temporary, somewhat anomalous, condition.  Once
> >words pass through that state and are known, then their spamicity is
> >a combination of non-zero spam/ham counts and robx.
> 
> That really depends on robs. If that is small, robx won't do
> much. Also the overall number of messages comes into play.
> 
> I once asked (nobody could answer) how those values should
> work with training on error where hapaxes are clearly more
> important. And also the number of messages seen is much
> smaller.

Hi pi,

All the parameters matter and the interactions between them are complex.
As you point out, we don't have the knowledge to answer every question. 
As time goes by, we gain experience and learn more.

What prompted this thread was noticing that with robs, min_dev, and fn
held constant changing robx yields a different spam_cutoff (as expected)
and yield a different false positive count (also expected).  The
surprise was how much the fp count changed.  

As Greg points out, had I looked at the lines where robs (or min_dev)
was the one that differed, I would have seen a comparable result.

David




More information about the Bogofilter mailing list