the importance of robx

David Relson relson at osagesoftware.com
Sun Feb 29 00:51:23 CET 2004


Greg,

While looking at "bogotune -vv" data, I've noticed that a small change
robx value can result in a large change in the fn value. It occurred to
me that robx is more important than previously thought.

robx is the score for unknown words.  I've always thought of "unknown"
as being a temporary, somewhat anomalous, condition.  Once words pass
through that state and are known, then their spamicity is a combination
of non-zero spam/ham counts and robx.  The thought was that robx is not
very significant.

And then I thought of a wordlist histogram and the large numbers of pure
ham/spam and the almost as large numbers of hapaxes.  Given that hapaxes
are so numerous, one can conclude that many words never get beyond their
hapax/robx value.

Additionally, if not using full training, many words will never get
beyond their initial (hapax) state.

This means that robx is, indeed, of more than passing value.

Interesting, eh??

David


-- 
David Relson                   Osage Software Systems, Inc.
relson at osagesoftware.com       Ann Arbor, MI 48103
www.osagesoftware.com          tel:  734.821.8800




More information about the Bogofilter mailing list