the importance of robx

Tom Anderson tanderso at oac-design.com
Sun Feb 29 17:15:32 CET 2004


On Sat, 2004-02-28 at 20:04, Greg Louis wrote:
> >  My robx is 0.48 and my min_dev is 0.2. 
> > This means that hapaxes will have no effect on your classifications.
> 
> I think you mean unknowns.  If a token has been seen exactly once
> before, it will have quite a strong influence that will be diluted by x
> to the degree specified by the s value.  Most of us use quite small s
> values so our hapaxes count heavily in classification.  I once removed
> all hapaxes from my training db to see what would happen, and
> bogofilter's accuracy worsened by an order of magnitude!

Well, I use a large robs because I don't want new words counting very
much.  If a word has been registered just once before or if it is the
first time, then it remains within min_dev, and doesn't count towards
classifications at all.

> not willing at the moment to try to explain this theoretically (it's

An explanation of your hypothesis would be nice.  I don't use bogotune,
as I don't keep large volumes of spam kicking around.  Nor will I ever
wish to.  Therefore, I tune manually depending on trends that I see.  I
don't assume a priori that bogotune gives an accurate basis for such a
conclusion as you've presented.  Theory would be appreciated.

Tom

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040229/28fb8d57/attachment.sig>


More information about the Bogofilter mailing list