significant digits?
Jonathan Buzzard
jonathan at buzzard.org.uk
Fri Apr 11 13:27:01 CEST 2003
relson at osagesoftware.com said:
> Calculations are done using variables of type "double", which varies
> between architectures. On Intel, a double is 8 bytes (64 bits).
> According to file float.h, a double has 15 decimal digits of
> precision. Bogofilter uses as much as the hardware will give it.
You miss the point. Because 0.95 cannot be represented precisely
as a binary fraction, it represents a poor choice for a cut off value.
My numerical analysis notes from university are quite adamant that
floating point comparisons should be against a number that can be
precisely represented as a binary fraction whenever possible.
What happens is that you might get a spamicity that is just a bit
less than the cutoff value of 0.95, but then gets classified as spam because
0.95 is represented as a slightly smaller number internally. You then get
puzzled people wondering why something with a spamicitiy less than the
specified cutoff value of 0.95 cutoff value is classified as spam.
It is a bit like using a step size of 0.1, though not as bad.
JAB.
--
Jonathan A. Buzzard Email: jonathan at buzzard.me.uk
Northumberland, United Kingdom. Tel: +44 1661-832195
More information about the Bogofilter
mailing list