Re: (long) sNaNs not what they could be...
Dan,
Thanks for sharing the story. It is informative.
George
On Oct 15, 2010, at 5:01 PM, Dan Zuras Intervals wrote:
> For anyone not interested in this topic, it
> will be a long diatribe on the inadequacies
> of NaNs as a diagonostic tool in 754.
>
> I apologise for posting it in this forum.
>
> Feel free to ignore & delete.
>
> For the rest (if any) ...
>
>> Date: 15 Oct 2010 21:35:26 +0100
>> From: "N.M. Maclaren" <nmm1@xxxxxxxxx>
>> To: Dan Zuras Intervals <intervals08@xxxxxxxxxxxxxx>
>> Cc: Lee Winter <lee.j.i.winter@xxxxxxxxx>, stds-1788@xxxxxxxxxxxxxxxxx
>> Subject: Re: Fw: Useless sNaNs... or useful?
>>
>> On Oct 15 2010, Dan Zuras Intervals wrote:
>>>>>>
>>>>>> One important use is to initialize all floating point variables to
>>>>>> signaling NaNs. If they are inadvertently not properly initialized
>>>
>>> =09What you & Ian probably don't know is that
>>> =09I, along with Prof Kahan & Bob, advocated
>>> =09for consistent behaviors for signalling NaNs
>>> =09that would permit exactly this application.
>>
>> Good. That would restore functionality that some compilers had
>> in 1970, though signalling NaNs is only one of many ways to achieve
>> the objective. It does, however, have the advantage that it has no
>> performance impact on code without such errors.
>>
>>> =09As it happened there were sound technical
>>> =09reasons for features that made it infeasible.
>>> =09All of which were thought to be more important
>>> =09than this application. All of which I can
>>> =09describe for you in detail but hesitate to do
>>> =09so in this forum.
>>
>> I am unaware of any, though I am aware of a great
>> many UNSOUND technical reasons :-(
>>
>> Please could you describe the sound technical reasons?
>> If you are reluctant to do so in a semi-public forum,
>> Email will do.
>>
>>
>> regards,
>> Nick Maclaren.
>
> I will outline it.
>
> But even that outline will take some time to
> explain.
>
> What we were after was a 'touch it & die NaN'.
> Something which even dereferencing would cause
> an invalid trap. Presumably to a debugger or
> to signal the use of an uninitialized variable.
>
> The method would be to fill memory with this
> fatal signalling NaN so that any read access
> would explode the mine. If your first use of
> memory was to write to it, you were safe. But
> if you read from it you would die the death of
> the uninitialized memory signalling NaN.
>
> Sounds simple enough.
>
> Why was that hard to do?
>
> Well, on most systems the load instruction is
> not typed. It has a width but not a type.
> Thus, if I am reading, say, a 32-bit quantity
> out of memory I generally don't know whether
> it is an integer, 4 characters, a single
> precision floating-point number, or part of
> a larger structure (either a larger floating-
> point number or some non-floating-point
> structure).
>
> So, die on load was not really feasible.
>
> No matter, it would be sufficient to die on
> first floating-point touch.
>
> Which would be fine if all floating-point
> touches went through the floating-point ALU.
> Alas, on many systems (Intel included) there
> are 3 floating-point operations that do not.
> They are copy, negate, & absolute value.
> They can avoid the ALU because they are just
> bit copies with a possible modification of
> the sign bit. And they are singled out in
> the standard for this reason.
>
> Of these the most important is copy. It is
> used on assignment.
>
> Or not. You see, modern optimizers are such
> that copies are generally eliminated except
> in rare cases. So, even if we were to 'arm'
> copies to trigger invalid we can't count on
> them actually being there.
>
> It is a bit more complex, but much the same
> can be said for negate. Unary negates are
> mostly eliminated by manipulation of prior
> or subsequent add-like operations (changing
> add to subtract or one kind of FMA to
> another).
>
> Absolute value is generally safe but also
> not often used.
>
> So these operations provide a hole through
> which an uninitialized value can slip
> unnoticed.
>
> No matter, we'll get them on the first
> arithmetic operation.
>
> But wait a minute, just what was the value
> of that uninitialized NaN?
>
> The one we were seeking was the all 1's NaN.
> The reason for this was that, for most
> computers, it is easy to fill memory with
> all zeros or all ones. Or even all copies
> of some particular byte. But filling memory
> with all values of anything more complex
> than that involves copying from a register
> or one place in memory to another. And that
> is a much slower operation.
>
> So, we wanted all 1's. That would make it
> easy & fast.
>
> But, as it happened, we recommended (in the
> sense of 'should') that the all 1's NaN be
> a quiet NaN. The reason for this is that
> the most common thing one does with a
> signalling NaN is to quiet it. If we had
> to do that by turning a "I'm a signalling
> NaN" bit from a 1 to a zero there was a
> danger of turning a NaN (with only that
> bit set) into an infinity.
>
> The technical term for this was "It would
> be bad".
>
> So the (strong) recommendation was that the
> bit that distinguished signalling NaNs from
> quiet ones take on the values 0 for signalling
> & 1 for quiet. That way there would always
> be a quiet NaN to 'land on' when one quiets
> some valid signalling NaN.
>
> So the all 1's NaN would not do. It had to
> be something else. It had to be something
> that had ones in some places (where the
> exponent was) & at least one zero elsewhere
> (the signalling bit).
>
> But, single precision floating-point numbers
> have 7-bit exponents. Doubles have 11. And
> quads have 15. In each case the position of
> the signalling bit is (recommended to be) 2
> bits to the right of the right most exponent
> bit. Counting the sign bit (just for byte
> alignment) that means that 10, 14, or 18 bits
> matter. The rest don't.
>
> But that means that we have to fill memory
> with a value that presumes we know which type
> will be incorrectly referenced there.
>
> How can we know that?
>
> Further, some systems align 16, 32, & 64 bit
> memory references on 16, 32, or 64 bit aligned
> memory locations. Some don't.
>
> So not only do we have to know which type will
> be incorrectly referenced, we have to know its
> memory alignment.
>
> If we get either one wrong, the bit pattern
> will just look like some otherwise innocent
> floating-point number.
>
> As the reference is presumed to be incorrect in
> the first place, how can we know how or why it
> is incorrect?
>
>
> Let's see, I may have missed something but I
> think that's most of it.
>
> Some of them may not apply to your computer.
>
> But I guarantee some of them do.
>
> So...
>
> -- We can't count on systems triggering
> an invalid trap if they encounter a
> signalling NaN because loads are not
> required to be typed.
>
> -- We can't count on knowing when we are
> touching a NaN because copies (& negates)
> are not required to go through the ALU.
>
> -- Even if we only count on trapping on
> arithmetic operations, some (like negate)
> are optimized out.
>
> -- We can't fill memory with all 1's
> because that is a quiet NaN.
>
> -- The kernal people won't fill it with
> any more complex pattern because it is
> noticably slower to do so.
>
> -- Even if they would, we could only
> catch invalid references to a NaN of a
> known type & alignment. All others
> would slip through as some other number.
>
> When all was said & done, the remaining diagnostic
> value of what could be done if you met all these
> limitations was considered to be of far less value
> than the limitations themselves.
>
> So we had to give up on it.
>
> Still, some enterprising compiler writer or debugger
> writer out there COULD do something along these lines.
> It wouldn't buy them much but it would be interesting.
>
> So, I ask again: Please name anyone who is doing this.
>
> I'd really like to talk to them about it.
>
> So would Prof Kahan.
>
> <End of long sad story>
>
>
> Dan
Dr. George F. Corliss
Electrical and Computer Engineering
Marquette University
P.O. Box 1881
1515 W. Wisconsin Ave
Milwaukee WI 53201-1881 USA
414-288-6599; GasDay: 288-4400; Fax 288-5579
George.Corliss@xxxxxxxxxxxxx
www.eng.mu.edu/corlissg