the importance of robx
Tom Anderson
tanderso at oac-design.com
Sun Feb 29 17:15:32 CET 2004
On Sat, 2004-02-28 at 20:04, Greg Louis wrote:
> > My robx is 0.48 and my min_dev is 0.2.
> > This means that hapaxes will have no effect on your classifications.
>
> I think you mean unknowns. If a token has been seen exactly once
> before, it will have quite a strong influence that will be diluted by x
> to the degree specified by the s value. Most of us use quite small s
> values so our hapaxes count heavily in classification. I once removed
> all hapaxes from my training db to see what would happen, and
> bogofilter's accuracy worsened by an order of magnitude!
Well, I use a large robs because I don't want new words counting very
much. If a word has been registered just once before or if it is the
first time, then it remains within min_dev, and doesn't count towards
classifications at all.
> not willing at the moment to try to explain this theoretically (it's
An explanation of your hypothesis would be nice. I don't use bogotune,
as I don't keep large volumes of spam kicking around. Nor will I ever
wish to. Therefore, I tune manually depending on trends that I see. I
don't assume a priori that bogotune gives an accurate basis for such a
conclusion as you've presented. Theory would be appreciated.
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20040229/28fb8d57/attachment.sig>
More information about the Bogofilter
mailing list