Re: Past uses of NaN?
On Nov 8 2010, Nate Hayes wrote:
The use for missing values was first perpetrated by C, and its rationale
for doing so may be politely described as demonstrably almost totally
bogus. In particular, its claim about the need for max(A,NaN) = A in
statistics is the converse of the truth. I could explain that in detail,
if people are interested.
I'd be very interested. Cany you give more details?
Certainly. The performance and usability arguments are bogus, because
saying that there are uses where max(x,NaN) = NaN would require an extra
test is irrelevant unless one can also show that those are more important
than the cases where ax(x,NaN) = x requires the use of an extra test!
I queried that and got no answer; I believe there is no such evidence.
As far as I know, the use of missing values in statistical applications
was pioneered by Rothamsted Experimental Station in the 1960s, like
several other technologies (possibly including spreadsheets), but
followed established statistical practice. Here is a summary of the
rules:
A missing value stands for a value that is almost certainly valid, but
is absent for some reason INDEPENDENT of its data domain - e.g. it is
NOT the same as a censored or truncated value. Those are trickier, but
can be represented using interval arithmetic. Anyway, let's ignore them.
In ordinary scalar or elemental expressions, if any operand is invalid,
the result is invalid; otherwise, if any operand is missing, the result
is missing; otherwise the expression is evaluated. Missing values use
special percolation rules in only two cases:
1) In reductions, if any operand is invalid, the result is invalid;
otherwise, the reduction is performed as if the elements with missing
operands were not present. That is the origin of C's aberration, but
note that it applies ONLY in reductions and NOT to invalid values.
2) There are algorithms to interpolate missing values from the ones
that are not missing and, obviously, they are treated differently from
invalid in that. But those algorithms do NOT use a simple x = op(x,a)
to interpolate the missing values, so are irrelevant.
There is also a further, very important, point. In general, the most
important reductions are a count and addition, by a long way, followed
by multiplication, followed a very long way later by max/min, Boolean
operations etc. So why should the argument apply to max/min and not
addition? And where is the IEEE 754 count operation, which is also
critical?
Regards,
Nick Maclaren.