The fourth meeting of the IEEE 754R revision group was held Wednesday 11 April 2001 at 1:00 pm at Network Appliance, Bob Davis chair. Attending were Joe Darcy, Bob Davis, Dick Delp, David Hough, David James, Rick James, W Kahan, Ren-Cang Li, Alex Liu, Jason Riedy, David Scott, Dan Zuras.
A mailing list has been established for this work. Send a message "subscribe stds-754" to majordomo@ieee.org to join. Knowledgeable persons with a desire to contribute positively toward an upward-compatible revision are encouraged to subscribe and participate.
The next meeting is scheduled for Weds 16 May at UC Berkeley, 3-7 PM, room to be announced. The subsequent meeting will be Weds June 20, 1-5 PM, Network Appliance bldg 2, Craftsman conference room.
The draft minutes of the previous meeting were corrected: the error bound for matrix multiply is hardly any better for a specific cache blocking than for the class of all cache-blocking algorithms, but the performance is much different. Minutes will be posted in a public area of the IEEE website to be announced.
Scope and Purpose . The existing purpose of the 754R effort refers to identical results. This has never been a universal literal aim of 754 or 854 - running afoul of various good and bad hardware and software features within systems that are still considered to conform to 754. From the IEEE's point of view, the committee can edit its scope and purpose for clarity, but orthogonal changes would require approval.
After some discussion we came up with a new
SCOPE: This standard specifies formats and methods for floating-point arithmetic in computer programming environments: standard and extended functions with single, double, extended, and extendable precision, and recommends formats for data interchange. Exception conditions are defined and default handling of these conditions is specified.
PURPOSE: This standard provides a discipline for performing floating-point computation that yields results independent of whether the processing is done in hardware, software, or a combination of the two. For operations specified in this standard, numerical results and exceptions are uniquely determined by the values of the input data, sequence of operations, and destination formats, all under programmer control.
The mention of "exceptions" reminded us that these are not defined in the glossary; neither are traps. A very elegant picture depicts the boundaries of arithmetic where exceptions arise; rendering that into a gif or jpg for inclusion in the minutes is an exercise for some student in CS279. Some more discussion produced a formalization of Kahan's epigrammatic "an exception is any situation where no matter what policy you adopt, somebody is bound to take exception."
EXCEPTION: An event that occurs when an operation has no outcome suitable for every reasonable application.
TRAP: A change in control flow that occurs as a result of an exception (not all exceptions trap).Finally, to conform to 854's better usage, all occurrences of "denormalized" were updated to "subnormal."
Precision beyond quad: The discussion turned on whether hardware precisions beyond quad would be needed or feasible in the lifetime of this standard; whether even if they did, we could predict how they should be formatted even to the extent of defining the size of the exponent field; and whether there is even one format rule that would be suitable for a sufficiently large class of applications - or whether the relatively few applications requiring large fixed floating-point formats would all require DIFFERENT ones - and whether these formats will more typically be in decimal than binary. Nobody was prepared to propose a variable floating-point format be standardized now - to be efficient on various current hardware, that format would likely be customized to that hardware as well as to the intended application.
Given these uncertainties, Kahan felt it unwise to add requirements to the standard in this area, that would likely be ignored and have the unfortunate effect of encouraging implementers to ignore other more important parts.
The resolution is that an informative appendix will be drafted giving NAMES to various preferred higher precisions at sizes of 2^n and 3/2*2^n. That size progression has a naturalness - being reasonably close to sqrt(2) - that is also revealed in typical patterns of Gaussian quadrature. [Question: will those named formats have a particular exponent field formula or a range of allowable exponent sizes? Is the exponent field size as important as rules for rounding and exceptions?]
FMA and NaN: The question is what to do about 0 * inf + qNaN. Which NaN should be returned? Should invalid be signaled? Conclusion: return the existing NaN, and do not signal invalid.
Two NaNs: Whether to prescribe which NaN to return depends on whether there is any difference among NaNs. The original intent, seldom attempted, was that the significand of NaNs would convey some information about how they arose, best by an index into a table of more elaborate information about exceptional operations or uninitialized data. Assuming that indices were assigned sequentially, a smaller index would represent an earlier and probably more interesting event. Therefore we decided that when faced with returning one of two NaNs, an implementation SHOULD return a NaN with the smaller significand - the fraction field viewed as an unsigned integer - ignoring the sign, exponent, and integer bit usually used to distinguish quiet from signaling NaNs. The resulting sign bit is not defined; the resulting NaN is quiet.
Correctly rounded base conversion: Reviewing Hough's proposed changes in wording, we noted 1) a need for a reference to a published correct rounding algorithm, 2) a need to extend the tables of exponent and significand requirements to cover double-extended and quad 3) what does "correctly rounded" mean in a decimal destination format? (if this is a problem, it has been a problem since 754). Hough will work on this some more.
Extended rounding precision: Joe Darcy provided wording to deprecate x86 style in favor of mc68k style - in which rounding precision also limits exponent range so that extended-based architectures can faithfully and economically emulate non-extended-based architectures. A footnote should reference Golliver's work that tries to accomplish this as economically as possible on x86. See the note below from Alexander Driker and Kahan's response. This issue was also a basis for much contention at a certain point in the evolution of Java.
Date: Mon, 26 Mar 2001 20:22:34 -0800 From: Alexander DrikerOrganization: ST To: wkahan@cs.berkeley.edu Subject: x87 question Dr. Kahan: I am observing something that I believe is incongruous, perhaps you can validate (or refute) my thoughts. I am getting different results when a single multiply is done in larger (extended or double) precision internally and then stored (converted) as single precision versus the same operation when done in single precision internally. Only one arithmetic operation is performed and I believe that all of these alternatives should match. I tried this on SPARC and it works as I expected. I am attaching a simple "C" program (compilable using BCC and SPARC gcc) which illustrates these cases. Thank you, Alex Driker =================================================== filename="multst.c" #define BCC #include #ifdef BCC #include #endif #define SINGLE 0 #define DOUBLE 1 #define EXTEND 2 #define RESERV 3 //======================================== // //======================================== #ifdef BCC void set_precision(int prec) { switch(prec) { case SINGLE : _control87( (0<<8), 0x0300); break; case DOUBLE : _control87( (2<<8), 0x0300); break; case EXTEND : _control87( (3<<8), 0x0300); break; case RESERV : _control87( (1<<8), 0x0300); break; default: printf("Set_precision: No such precision\n"); } printf(" cw = %x\n", _control87( 0, 0)); } #endif //======================================== // //======================================== #ifdef BCC void main() #else int main() #endif { float opa, opb, res; double opad, opbd, resd; unsigned int *pinta, *pintb, *pintr; unsigned int *pintad, *pintbd, *pintrd; pinta = (unsigned int*)&opa; pintb = (unsigned int*)&opb; pintr = (unsigned int*)&res; pintad = (unsigned int*)&opad; pintbd = (unsigned int*)&opbd; pintrd = (unsigned int*)&resd; // First set of operands *pinta = 0x197e02f5; *pintb = 0x26810000; #ifdef BCC set_precision(EXTEND); res = opa * opb; printf("opA=%8.8x opB=%8.8x res=%8.8x\n", *pinta, *pintb, *pintr); #endif #ifdef BCC set_precision(DOUBLE); #endif opad = (double)opa; opbd = (double)opb; res = (float)(opad * opbd); printf("opA=%8.8x opB=%8.8x res=%8.8x\n", *pinta, *pintb, *pintr); #ifdef BCC set_precision(SINGLE); #endif res = opa * opb; printf("opA=%8.8x opB=%8.8x res=%8.8x\n", *pinta, *pintb, *pintr); // Second set of operands *pinta = 0xa100000d; *pintb = 0x9e800008; #ifdef BCC set_precision(EXTEND); res = opa * opb; printf("opA=%8.8x opB=%8.8x res=%8.8x\n", *pinta, *pintb, *pintr); #endif #ifdef BCC set_precision(DOUBLE); #endif opad = (double)opa; opbd = (double)opb; res = (float)(opad * opbd); printf("opA=%8.8x opB=%8.8x res=%8.8x\n", *pinta, *pintb, *pintr); #ifdef BCC set_precision(SINGLE); #endif res = opa * opb; printf("opA=%8.8x opB=%8.8x res=%8.8x\n", *pinta, *pintb, *pintr); } =========================================================================== Examples submitted by Alex Driker, 26 March 2001 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Operand in Hex Hex Significand Dec. Exponent a1: 197e02f5 1.fc05ea -77 b1: 26810000 1.020000 -50 p1 = exact product 1.fffdf5d4 -127 [p1] to 24 sig. bits 1.fffdf6 -127 store [p1] as float 0.fffefb -126 [p1]: 00fffefb p1 denormalized 0.fffefaea -126 {p1} rounded to float 0.fffefa -126 {p1}: 00fffefa Summary: If the product is computed exactly and then rounded to 24 sig. bits as if exponent range were unlimiteded, the result [p1] can be denormalized and stored as a float. This is what happens when x87's precision control is set to round to 24 sig. bits in the floating-point stack registers. But if the product is computed exactly and then denormalized to stay within float's exponent range, and then rounded as a float to (now) 23 sig. bits, the result {p1} must differ from [p1] in its last bit stored. {p1} is what SPARCs and MIPs get. To get {p1} from an x87, leave precision control at 53 or 64 sig. bits so that rounding occurs at a FST (store) operation interposed after every arithmetic operation. Java does this, but BCC does not. I prefer BCC's way. Operand in Hex Hex Significand Dec. Exponent a2: a100000d -1.00001a -61 b2: 9e800008 -1.000010 -66 p2 = exact product 1.00002a0001a -127 [p2] to 24 sig. bits 1.00002a -127 store [p2] as float 0.800015 -126 [p2]: 00800015 p2 denormalized 0.8000150000d -126 {p2} rounded to float 0.800016 -126 {p2}: 00800016 Exercise for the Diligent Student: Find operands a3 and b3 for which SPARC's once-rounded {p3} is closer to the exact product p3 than is the x87's twice-rounded [p3] , albeit by less than one unit in the last place. Discussion: IEEE 754 requires that any implementor of all three floating-point formats, namely single, double, and double-extended, provide precision-control modes that allow results destined for that last and widest format to be rounded to the narrower format's precision if the programmer so desires. This allows that implementation to mimic those implementations lacking the third format by getting the same results unless over/underflow occurs. IEEE 754 allows the implementor to choose whether to mimic the narrower exponent range too when rounding to a narrower precision in a wider destination register. The Motorola 68881, designed in 1981, chose to restrict exponent range to match precision when narrowed by precision control. This allowed the 68881 easily to mimic less well endowed machines perfectly. The Intel 8087, designed in 1977 before IEEE 754 was promulgated, chose to keep the wider destination registers' exponent range. My motive for this choice was to provide Intel's customers with valid results when the mimicked machine got only error-messages or invalid results because of intermediate over/underflow in an expression that the Intel machine could evaluate as the programmer intended in its wider floating-point registers. Things did not work out as I planned. Most compilers on the x86/x87 supported the third floating-point format either grudgingly or not at all (Microsoft did both) while using it surreptitiously to evaluate only expressions deemed simple enough by the compiler. Part of the trouble can be blamed upon a design error perpetrated in Israel that transformed the handling of floating-point register spill on the x87 from painless to awful. Thus, what should have been an advantage for users of portable computer software on x87s turned into several nasty nuisances for software developers and testers. Most of the nuisances arise from the compilers' (mis)use of the x87's ill-supported third format, and would go away if programmers could specify what they desire (and get it) as C 9X provides. But one nuisance persists, and it snagged Alex Driker: Intel's x87 precision control mimics imperfectly the underflowed intermediate results obtained by less well endowed machines unless extra stores and loads (among other things) are insinuated into the instruction stream. The imperfection, the difference between one rounding and two, is tinier than the tiniest nonzero number; but it is still a difference with which software developers and testers have to contend. Back in 1977 this nuisance seemed inconsequential to me compared with the prospect of getting intended results instead of spurious over/underflows in intermediate expressions. If only I had known then what I know now! The Intel/Hewlett-Packard Itanium allows programmers to mimic less well endowed machines either the x87's way or the 68881's way. I expect the latter to become the only way in the long run, perhaps after I'm dead. Prof. W. Kahan