Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Fw: (long) sNaNs not what they could be...



I thank you too.

The Unix and z/OS operating systems and the C and C++ languages define all static and extern variables to be preinitialized to zeros by default. That doesn't rule out an operating system and/or compiler option to override that, but as you explained an operating system doesn't know which locations hold what precision floating point variables, and which hold integers, characters, pointers, etc. As you said, in theory a compiler could. I don't know of any of today's operating systems or compilers that do. Not all progress is in the forward direction.

Having the operating system do it is incomplete anyway. It can't cover auto (stack) variables or those allocated on the heap, when after the initial use of that memory location it is freed then reused for something different. That's why some compilers like IBM's XL C/C++ and XL Fortran have the -qinitauto=hexvalue compiler option to initialize the stack to whatever byte or word bit pattern you choose at the beginning of each function's execution. With the right value you can cover either single precision or both double precision and IBM's "double double" quad precision. That's only a partial solution but still useful. In practise most programs are dominated by one precision.

The users I was referring to (no, I can't name them) fill their arrays with NaNs initially, and since they know which ones are floating point and what precision they are, they can choose the right NaN.

The XL compilers -qflttrap=nanq option generates code to check every load of and every operation producing a floating point value and trap if the result is a NaN, whether signaling or quiet. It's very useful for debugging, especially if you preinitialize to NaNs.

WRT Signaling NaNs, the world you envisioned would have been better than what we have and that must be frustrating, but what we have is better than if you hadn't tried. So I thank you for what you did accomplish.

- Ian McIntosh IBM Canada Lab Compiler Back End Support and Development

----- Forwarded by Ian McIntosh/Toronto/IBM on 10/15/2010 07:17 PM -----


From:

"Corliss, George" <george.corliss@xxxxxxxxxxxxx>

To:

Ian McIntosh/Toronto/IBM@IBMCA

Date:

10/15/2010 07:06 PM

Subject:

Re: (long) sNaNs not what they could be...





Dan,

Thanks for sharing the story.  It is informative.

George
On Oct 15, 2010, at 5:01 PM, Dan Zuras Intervals wrote:

> For anyone not interested in this topic, it
> will be a long diatribe on the inadequacies
> of NaNs as a diagonostic tool in 754.
>
> I apologise for posting it in this forum.
>
> Feel free to ignore & delete.
>
> For the rest (if any) ...
>
>> Date: 15 Oct 2010 21:35:26 +0100
>> From: "N.M. Maclaren" <nmm1@xxxxxxxxx>
>> To: Dan Zuras Intervals <intervals08@xxxxxxxxxxxxxx>
>> Cc: Lee Winter <lee.j.i.winter@xxxxxxxxx>, stds-1788@xxxxxxxxxxxxxxxxx
>> Subject: Re: Fw: Useless sNaNs... or useful?
>>
>> On Oct 15 2010, Dan Zuras Intervals wrote:
>>>>>>
>>>>>> One important use is to initialize all floating point variables to
>>>>>> signaling NaNs. If they are inadvertently not properly initialized
>>>
>>> =09What you & Ian probably don't know is that
>>> =09I, along with Prof Kahan & Bob, advocated
>>> =09for consistent behaviors for signalling NaNs
>>> =09that would permit exactly this application.
>>
>> Good.  That would restore functionality that some compilers had
>> in 1970, though signalling NaNs is only one of many ways to achieve
>> the objective.  It does, however, have the advantage that it has no
>> performance impact on code without such errors.
>>
>>> =09As it happened there were sound technical
>>> =09reasons for features that made it infeasible.
>>> =09All of which were thought to be more important
>>> =09than this application.  All of which I can
>>> =09describe for you in detail but hesitate to do
>>> =09so in this forum.
>>
>> I am unaware of any, though I am aware of a great
>> many UNSOUND technical reasons :-(
>>
>> Please could you describe the sound technical reasons?
>> If you are reluctant to do so in a semi-public forum,
>> Email will do.
>>
>>
>> regards,
>> Nick Maclaren.
>
> I will outline it.
>
> But even that outline will take some time to
> explain.
>
> What we were after was a 'touch it & die NaN'.
> Something which even dereferencing would cause
> an invalid trap.  Presumably to a debugger or
> to signal the use of an uninitialized variable.
>
> The method would be to fill memory with this
> fatal signalling NaN so that any read access
> would explode the mine.  If your first use of
> memory was to write to it, you were safe.  But
> if you read from it you would die the death of
> the uninitialized memory signalling NaN.
>
> Sounds simple enough.
>
> Why was that hard to do?
>
> Well, on most systems the load instruction is
> not typed.  It has a width but not a type.
> Thus, if I am reading, say, a 32-bit quantity
> out of memory I generally don't know whether
> it is an integer, 4 characters, a single
> precision floating-point number, or part of
> a larger structure (either a larger floating-
> point number or some non-floating-point
> structure).
>
> So, die on load was not really feasible.
>
> No matter, it would be sufficient to die on
> first floating-point touch.
>
> Which would be fine if all floating-point
> touches went through the floating-point ALU.
> Alas, on many systems (Intel included) there
> are 3 floating-point operations that do not.
> They are copy, negate, & absolute value.
> They can avoid the ALU because they are just
> bit copies with a possible modification of
> the sign bit.  And they are singled out in
> the standard for this reason.
>
> Of these the most important is copy.  It is
> used on assignment.
>
> Or not.  You see, modern optimizers are such
> that copies are generally eliminated except
> in rare cases.  So, even if we were to 'arm'
> copies to trigger invalid we can't count on
> them actually being there.
>
> It is a bit more complex, but much the same
> can be said for negate.  Unary negates are
> mostly eliminated by manipulation of prior
> or subsequent add-like operations (changing
> add to subtract or one kind of FMA to
> another).
>
> Absolute value is generally safe but also
> not often used.
>
> So these operations provide a hole through
> which an uninitialized value can slip
> unnoticed.
>
> No matter, we'll get them on the first
> arithmetic operation.
>
> But wait a minute, just what was the value
> of that uninitialized NaN?
>
> The one we were seeking was the all 1's NaN.
> The reason for this was that, for most
> computers, it is easy to fill memory with
> all zeros or all ones.  Or even all copies
> of some particular byte.  But filling memory
> with all values of anything more complex
> than that involves copying from a register
> or one place in memory to another.  And that
> is a much slower operation.
>
> So, we wanted all 1's.  That would make it
> easy & fast.
>
> But, as it happened, we recommended (in the
> sense of 'should') that the all 1's NaN be
> a quiet NaN.  The reason for this is that
> the most common thing one does with a
> signalling NaN is to quiet it.  If we had
> to do that by turning a "I'm a signalling
> NaN" bit from a 1 to a zero there was a
> danger of turning a NaN (with only that
> bit set) into an infinity.
>
> The technical term for this was "It would
> be bad".
>
> So the (strong) recommendation was that the
> bit that distinguished signalling NaNs from
> quiet ones take on the values 0 for signalling
> & 1 for quiet.  That way there would always
> be a quiet NaN to 'land on' when one quiets
> some valid signalling NaN.
>
> So the all 1's NaN would not do.  It had to
> be something else.  It had to be something
> that had ones in some places (where the
> exponent was) & at least one zero elsewhere
> (the signalling bit).
>
> But, single precision floating-point numbers
> have 7-bit exponents.  Doubles have 11.  And
> quads have 15.  In each case the position of
> the signalling bit is (recommended to be) 2
> bits to the right of the right most exponent
> bit.  Counting the sign bit (just for byte
> alignment) that means that 10, 14, or 18 bits
> matter.  The rest don't.
>
> But that means that we have to fill memory
> with a value that presumes we know which type
> will be incorrectly referenced there.
>
> How can we know that?
>
> Further, some systems align 16, 32, & 64 bit
> memory references on 16, 32, or 64 bit aligned
> memory locations.  Some don't.
>
> So not only do we have to know which type will
> be incorrectly referenced, we have to know its
> memory alignment.
>
> If we get either one wrong, the bit pattern
> will just look like some otherwise innocent
> floating-point number.
>
> As the reference is presumed to be incorrect in
> the first place, how can we know how or why it
> is incorrect?
>
>
> Let's see,  I may have missed something but I
> think that's most of it.
>
> Some of them may not apply to your computer.
>
> But I guarantee some of them do.
>
> So...
>
> -- We can't count on systems triggering
> an invalid trap if they encounter a
> signalling NaN because loads are not
> required to be typed.
>
> -- We can't count on knowing when we are
> touching a NaN because copies (& negates)
> are not required to go through the ALU.
>
> -- Even if we only count on trapping on
> arithmetic operations, some (like negate)
> are optimized out.
>
> -- We can't fill memory with all 1's
> because that is a quiet NaN.
>
> -- The kernal people won't fill it with
> any more complex pattern because it is
> noticably slower to do so.
>
> -- Even if they would, we could only
> catch invalid references to a NaN of a
> known type & alignment.  All others
> would slip through as some other number.
>
> When all was said & done, the remaining diagnostic
> value of what could be done if you met all these
> limitations was considered to be of far less value
> than the limitations themselves.
>
> So we had to give up on it.
>
> Still, some enterprising compiler writer or debugger
> writer out there COULD do something along these lines.
> It wouldn't buy them much but it would be interesting.
>
> So, I ask again: Please name anyone who is doing this.
>
> I'd really like to talk to them about it.
>
> So would Prof Kahan.
>
> <End of long sad story>
>
>
> Dan

Dr. George F. Corliss
Electrical and Computer Engineering
Marquette University
P.O. Box 1881
1515 W. Wisconsin Ave
Milwaukee WI 53201-1881 USA
414-288-6599; GasDay: 288-4400; Fax 288-5579
George.Corliss@xxxxxxxxxxxxx
www.eng.mu.edu/corlissg