significant digits?
David Relson
relson at osagesoftware.com
Fri Apr 11 13:52:46 CEST 2003
At 07:27 AM 4/11/03, Jonathan Buzzard wrote:
>relson at osagesoftware.com said:
> > Calculations are done using variables of type "double", which varies
> > between architectures. On Intel, a double is 8 bytes (64 bits).
> > According to file float.h, a double has 15 decimal digits of
> > precision. Bogofilter uses as much as the hardware will give it.
>
>You miss the point. Because 0.95 cannot be represented precisely
>as a binary fraction, it represents a poor choice for a cut off value.
>My numerical analysis notes from university are quite adamant that
>floating point comparisons should be against a number that can be
>precisely represented as a binary fraction whenever possible.
>
>What happens is that you might get a spamicity that is just a bit
>less than the cutoff value of 0.95, but then gets classified as spam because
>0.95 is represented as a slightly smaller number internally. You then get
>puzzled people wondering why something with a spamicitiy less than the
>specified cutoff value of 0.95 cutoff value is classified as spam.
>
>It is a bit like using a step size of 0.1, though not as bad.
>
>JAB.
Jonathan,
Agreed, 0.95 cannot be exactly represented as a binary. I think it's silly
to worry about that. Most decimal fractions are infinite repeating
fractions when represented in binary just like most fractions are infinite
repeating fractions when represented as decimals.
Exact binary representations are somewhat useful if you're doing equality
checks. They're minimally useful when doing inequality checks. Bogofilter
uses the spam_cutoff value is used to divide the results in two groups -
those with scores above and those with scores below. If we used a value
that did have an exact binary representation, it wouldn't act much differently.
We could find an exact binary near to 0.95 - perhaps 97/128 is close enough
- but what's the point?
You don't _have_ to use 0.95 as your spam_cutoff. If you'd rather use 0.50
or 0.75 or 0.875, just add the appropriate line to your bogofilter config file.
David
More information about the Bogofilter
mailing list