Minutes from 754R meeting 11 April 2001

David Hough

The fourth meeting of the IEEE 754R revision group was held Wednesday 11 April 2001 at 1:00 pm at Network Appliance, Bob Davis chair. Attending were Joe Darcy, Bob Davis, Dick Delp, David Hough, David James, Rick James, W Kahan, Ren-Cang Li, Alex Liu, Jason Riedy, David Scott, Dan Zuras.

A mailing list has been established for this work. Send a message "subscribe stds-754" to majordomo@ieee.org to join. Knowledgeable persons with a desire to contribute positively toward an upward-compatible revision are encouraged to subscribe and participate.

The next meeting is scheduled for Weds 16 May at UC Berkeley, 3-7 PM, room to be announced. The subsequent meeting will be Weds June 20, 1-5 PM, Network Appliance bldg 2, Craftsman conference room.

The draft minutes of the previous meeting were corrected: the error bound for matrix multiply is hardly any better for a specific cache blocking than for the class of all cache-blocking algorithms, but the performance is much different. Minutes will be posted in a public area of the IEEE website to be announced.

Scope and Purpose . The existing purpose of the 754R effort refers to identical results. This has never been a universal literal aim of 754 or 854 - running afoul of various good and bad hardware and software features within systems that are still considered to conform to 754. From the IEEE's point of view, the committee can edit its scope and purpose for clarity, but orthogonal changes would require approval.

After some discussion we came up with a new

SCOPE: This standard specifies formats and methods for floating-point arithmetic in computer programming environments: standard and extended functions with single, double, extended, and extendable precision, and recommends formats for data interchange. Exception conditions are defined and default handling of these conditions is specified.
PURPOSE: This standard provides a discipline for performing floating-point computation that yields results independent of whether the processing is done in hardware, software, or a combination of the two. For operations specified in this standard, numerical results and exceptions are uniquely determined by the values of the input data, sequence of operations, and destination formats, all under programmer control.

The mention of "exceptions" reminded us that these are not defined in the glossary; neither are traps. A very elegant picture depicts the boundaries of arithmetic where exceptions arise; rendering that into a gif or jpg for inclusion in the minutes is an exercise for some student in CS279. Some more discussion produced a formalization of Kahan's epigrammatic "an exception is any situation where no matter what policy you adopt, somebody is bound to take exception."

EXCEPTION: An event that occurs when an operation has no outcome suitable for every reasonable application.
TRAP: A change in control flow that occurs as a result of an exception (not all exceptions trap).
Finally, to conform to 854's better usage, all occurrences of "denormalized" were updated to "subnormal."

Precision beyond quad: The discussion turned on whether hardware precisions beyond quad would be needed or feasible in the lifetime of this standard; whether even if they did, we could predict how they should be formatted even to the extent of defining the size of the exponent field; and whether there is even one format rule that would be suitable for a sufficiently large class of applications - or whether the relatively few applications requiring large fixed floating-point formats would all require DIFFERENT ones - and whether these formats will more typically be in decimal than binary. Nobody was prepared to propose a variable floating-point format be standardized now - to be efficient on various current hardware, that format would likely be customized to that hardware as well as to the intended application.

Given these uncertainties, Kahan felt it unwise to add requirements to the standard in this area, that would likely be ignored and have the unfortunate effect of encouraging implementers to ignore other more important parts.

The resolution is that an informative appendix will be drafted giving NAMES to various preferred higher precisions at sizes of 2^n and 3/2*2^n. That size progression has a naturalness - being reasonably close to sqrt(2) - that is also revealed in typical patterns of Gaussian quadrature. [Question: will those named formats have a particular exponent field formula or a range of allowable exponent sizes? Is the exponent field size as important as rules for rounding and exceptions?]

FMA and NaN: The question is what to do about 0 * inf + qNaN. Which NaN should be returned? Should invalid be signaled? Conclusion: return the existing NaN, and do not signal invalid.

Two NaNs: Whether to prescribe which NaN to return depends on whether there is any difference among NaNs. The original intent, seldom attempted, was that the significand of NaNs would convey some information about how they arose, best by an index into a table of more elaborate information about exceptional operations or uninitialized data. Assuming that indices were assigned sequentially, a smaller index would represent an earlier and probably more interesting event. Therefore we decided that when faced with returning one of two NaNs, an implementation SHOULD return a NaN with the smaller significand - the fraction field viewed as an unsigned integer - ignoring the sign, exponent, and integer bit usually used to distinguish quiet from signaling NaNs. The resulting sign bit is not defined; the resulting NaN is quiet.

Correctly rounded base conversion: Reviewing Hough's proposed changes in wording, we noted 1) a need for a reference to a published correct rounding algorithm, 2) a need to extend the tables of exponent and significand requirements to cover double-extended and quad 3) what does "correctly rounded" mean in a decimal destination format? (if this is a problem, it has been a problem since 754). Hough will work on this some more.

Extended rounding precision: Joe Darcy provided wording to deprecate x86 style in favor of mc68k style - in which rounding precision also limits exponent range so that extended-based architectures can faithfully and economically emulate non-extended-based architectures. A footnote should reference Golliver's work that tries to accomplish this as economically as possible on x86. See the note below from Alexander Driker and Kahan's response. This issue was also a basis for much contention at a certain point in the evolution of Java.


The following item illustrates the problems with X87 extended precision rounding precision mode control; it's from http://www.cs.berkeley.edu/~wkahan/CS279/X87QuestionUnderflow Kahan's class website.

Date: Mon, 26 Mar 2001 20:22:34 -0800
From: Alexander Driker 
Organization: ST
To: wkahan@cs.berkeley.edu
Subject: x87 question

Dr. Kahan:

I am observing something that I believe is incongruous, perhaps you can
validate (or refute) my thoughts.

I am getting different results when a single multiply is done in larger
(extended or double) precision internally and then
stored (converted) as single precision versus the same operation when
done in single precision internally.
Only one arithmetic operation is performed and I believe that all of
these alternatives should match. I tried this on SPARC and it works as I

I am attaching a simple "C" program (compilable using BCC and SPARC gcc)
which illustrates these cases.

Thank you,

Alex Driker


#define BCC

#ifdef BCC

#define SINGLE 0
#define DOUBLE 1
#define EXTEND 2
#define RESERV 3

#ifdef BCC
void set_precision(int prec)
    switch(prec) {
        case SINGLE :   _control87( (0<<8), 0x0300); break;
        case DOUBLE :   _control87( (2<<8), 0x0300); break;
        case EXTEND :   _control87( (3<<8), 0x0300); break;
        case RESERV :   _control87( (1<<8), 0x0300); break;
        default:        printf("Set_precision: No such precision\n");
    printf(" cw = %x\n", _control87( 0, 0));

#ifdef BCC
void main()
int main()
        float opa, opb, res;
        double opad, opbd, resd;
        unsigned int *pinta, *pintb, *pintr;
        unsigned int *pintad, *pintbd, *pintrd;

        pinta = (unsigned int*)&opa;
        pintb = (unsigned int*)&opb;
        pintr = (unsigned int*)&res;

        pintad = (unsigned int*)&opad;
        pintbd = (unsigned int*)&opbd;
        pintrd = (unsigned int*)&resd;

    // First set of operands
    *pinta = 0x197e02f5;
        *pintb = 0x26810000;

#ifdef BCC

        res = opa * opb;
        printf("opA=%8.8x opB=%8.8x res=%8.8x\n", *pinta, *pintb, *pintr);

#ifdef BCC
    opad = (double)opa;
        opbd = (double)opb;
    res  = (float)(opad * opbd);
        printf("opA=%8.8x opB=%8.8x res=%8.8x\n", *pinta, *pintb, *pintr);

#ifdef BCC
        res = opa * opb;
        printf("opA=%8.8x opB=%8.8x res=%8.8x\n", *pinta, *pintb, *pintr);

    // Second set of operands
    *pinta = 0xa100000d;
    *pintb = 0x9e800008;
#ifdef BCC

    res = opa * opb;
        printf("opA=%8.8x opB=%8.8x res=%8.8x\n", *pinta, *pintb, *pintr);

#ifdef BCC
    opad = (double)opa;
        opbd = (double)opb;
    res  = (float)(opad * opbd);
        printf("opA=%8.8x opB=%8.8x res=%8.8x\n", *pinta, *pintb, *pintr);

#ifdef BCC
        res = opa * opb;
        printf("opA=%8.8x opB=%8.8x res=%8.8x\n", *pinta, *pintb, *pintr);


      Examples submitted by  Alex Driker,  26 March 2001

    Operand   in  Hex        Hex Significand      Dec. Exponent
     a1:      197e02f5         1.fc05ea                -77
     b1:      26810000         1.020000                -50
     p1 = exact product        1.fffdf5d4             -127
     [p1]  to  24  sig. bits   1.fffdf6               -127
     store  [p1]  as float     0.fffefb               -126
     [p1]:    00fffefb
     p1  denormalized          0.fffefaea             -126
     {p1}  rounded to float    0.fffefa               -126
     {p1}:    00fffefa

     Summary:  If the product is computed exactly and then
        rounded to  24 sig. bits  as if exponent range were
        unlimiteded,  the result  [p1]  can be denormalized
        and stored as a  float.  This is what happens when
        x87's  precision control is set to round to  24
        sig. bits in the floating-point stack registers.

        But if the product is computed exactly and then
        denormalized to stay within  float's  exponent
        range,  and then rounded as a  float to  (now)
        23  sig. bits,  the result  {p1}  must differ
        from  [p1]  in its last bit stored.  {p1}  is
        what  SPARCs  and  MIPs  get.  To get  {p1}
        from an  x87,  leave precision control at  53
        or  64  sig. bits so that rounding occurs at
        a  FST  (store)  operation interposed after
        every arithmetic operation.  Java  does this,
        but  BCC  does not.  I prefer  BCC's  way.

    Operand   in  Hex        Hex Significand       Dec. Exponent
     a2:      a100000d        -1.00001a                 -61
     b2:      9e800008        -1.000010                 -66
     p2 = exact product        1.00002a0001a           -127
     [p2]  to  24  sig. bits   1.00002a                -127
     store  [p2]  as float     0.800015                -126
     [p2]:    00800015
     p2  denormalized          0.8000150000d           -126
     {p2}  rounded to float    0.800016                -126
     {p2}:    00800016

    Exercise  for the  Diligent Student:  Find operands  a3
    and  b3  for which  SPARC's  once-rounded  {p3}  is closer
    to the exact product  p3  than is the  x87's  twice-rounded
    [p3] ,  albeit by less than one unit in the last place.

    Discussion:  IEEE 754  requires that any implementor of all
    three floating-point formats,  namely  single,  double,  and
    double-extended,  provide precision-control modes that allow
    results destined for that last and widest format to be
    rounded to the narrower format's precision if the programmer
    so desires.  This allows that implementation to mimic those
    implementations lacking the third format by getting the same
    results unless over/underflow occurs.  IEEE 754  allows the
    implementor to choose whether to mimic the narrower exponent
    range too when rounding to a narrower precision in a wider
    destination register.

    The  Motorola 68881,  designed in  1981,  chose to restrict
    exponent range to match precision when narrowed by precision
    control.  This allowed the  68881  easily to mimic less well
    endowed machines perfectly.  The  Intel 8087,  designed in
    1977  before  IEEE 754  was promulgated,  chose to keep the
    wider destination registers' exponent range.  My motive for
    this choice was to provide  Intel's  customers with valid
    results when the mimicked machine got only error-messages or
    invalid results because of intermediate over/underflow in an
    expression that the  Intel  machine could evaluate as the
    programmer intended in its wider floating-point registers.

    Things did not work out as I planned.  Most compilers on the
    x86/x87  supported the third floating-point format either
    grudgingly or not at all  (Microsoft  did both)  while using
    it surreptitiously to evaluate only expressions deemed
    simple enough by the compiler.  Part of the trouble can be
    blamed upon a design error perpetrated in  Israel  that
    transformed the handling of floating-point register spill
    on the  x87  from painless to awful.

    Thus,  what should have been an advantage for users of
    portable computer software on  x87s  turned into several
    nasty nuisances for software developers and testers.

    Most of the nuisances arise from the compilers' (mis)use of
    the  x87's  ill-supported third format,  and would go away
    if programmers could specify what they desire  (and get it)
    as  C 9X  provides.  But one nuisance persists,  and it
    snagged  Alex Driker:

       Intel's x87  precision control mimics imperfectly
       the underflowed intermediate results obtained by
       less well endowed machines unless extra stores
       and loads  (among other things)  are insinuated
       into the instruction stream.  The imperfection,
       the difference between one rounding and two,  is
       tinier than the tiniest nonzero number;  but it
       is still a difference with which software
       developers and testers have to contend.

    Back in  1977  this nuisance seemed inconsequential to me
    compared with the prospect of getting intended results
    instead of spurious over/underflows in intermediate
    expressions.  If only I had known then what I know now!

    The  Intel/Hewlett-Packard  Itanium  allows programmers
    to mimic less well endowed machines either the  x87's
    way or the  68881's  way.  I expect the latter to become
    the only way in the long run,  perhaps after I'm dead.

                                             Prof. W. Kahan

754 | revision | FAQ | references