Minutes from 754R meeting 16 May 2001

David Hough

The fifth meeting of the IEEE 754R revision group was held Wednesday 16 May 2001 at 3:00 pm at University of California Berkeley Soda Hall, Bob Davis chair. Attending were David Bindel, Joe Darcy, Bob Davis, Jim Demmel, David Hough, W Kahan, Ren-Cang Li, Alex Liu, Peter Markstein, Michael Parks, Tom Pittman, Jason Riedy, David Scott, Jim Thomas, Neil Toda, Dan Zuras.

A mailing list has been established for this work. Send a message "subscribe stds-754" to majordomo@ieee.org to join. Knowledgeable persons with a desire to contribute positively toward a substantially upward-compatible revision are encouraged to subscribe and participate.

The next meeting is scheduled for Wednesday June 20, 1-5 PM, Network Appliance Building 2, Craftsman conference room. The next meeting after that might be a Thursday afternoon in August at Berkeley.

The draft minutes of the previous meeting were approved.

ARITH conference, June 11-13, Vail CO - Hough has proposed a workshop discussion on 754R goals and progress, and asked for volunteers to attend the conference and present positions on some aspect of the deliberations. Travel freeze panic has gripped the industry as a whole, but even so Michael Parks and Ren-Cang Li are planning to attend, though not willing yet to commit to speaking. Kahan offered to provide a written position but won't be there to present it (this is an opportunity for a student). Paul Zimmermann volunteered subsequently by email to participate in the area of transcendental function standardization.

Names of larger formats - Kahan has a proposal for an appendix - http://www.cs.berkeley.edu/~wkahan/ieee754status/Names.pdf. The abstract:

I lack the courage to impose names upon floating-point formats of diverse widths, precisions, ranges and radices, fearing the damage that can be done to one's progeny by saddling them with ill-chosen names. Still, we need at least a list of the things we must name; otherwise how can we discuss them without talking at cross- purposes? These things include ... Wordsize, Width, Precision, Range, Radix, etc., all of which must be both declarable by a computer programmer, and ascertainable by a program through Environmental Inquiries. And precision must be subject to truncation by Coercions.

The proposal is a scheme for programs to request a minimum amount of precision and exponent range, and find out what precision and exponent range were actually allocated. What about expression evaluation involving mixed precisions, some fully supported in hardware, others not?

Radix ten floating point - proposal by Hough - http://www.validlab.com/754R/hough-changes-4.html . The proposal simply removes all the redundant sections from 854 and retains and numbers in parallel the sections containing unique content for radix ten. This was accepted by consensus. Proposed changes for Annex C: Remove all sections except C.3.1 through C.3.3; renumber those to C.4.1 through C.4.3 to correspond to the main text; add this new introductory paragraph:

The following sections C.4.1 through C.4.3 explain how the principles of the binary floating point standard sections 4.1 through 4.3 SHOULD be extended to accommodate floating-point arithmetic based on radix ten. All other aspects of the binary standard apply equally well to radix ten.

Underflow - proposal by Hough - http://www.validlab.com/754R/hough-changes-4.html . The intent of this proposal was to have a uniform definition of underflow - not depending on whether the trap was enabled, not depending on rounding modes, and allowing no latitude among implementations - all these had proven to be confusion factors since 754 days, and subsequent discussion proved that even the experts had significantly different understandings of the meaning of that standard in this area.

We quickly agreed that having the same underflow definition whether or not the trap was enabled was a good idea, in terms of implementation design and testing, but it does entail some kind of compromise: if underflow be independent of inexactness, then untrapped exact subnormal results will raise gratuitous underflow flags; if underflow depend on inexactness, then enabling the underflow trap will not catch all subnormal results, namely the exact subnormal results. After recalling the history and the fears surrounding subnormal results twenty years ago, which have mostly proven unfounded, and the desire that exact subnormal scaling factors raise no exception, on the other hand, we decided that the preferable compromise was to preserve the latter behavior, so that underflow is only signaled when the result is or would be inexact. (scalb would be an unexceptional way of scaling, but is unacceptably slow in all implementations so far compared to multiplication).

We noted in passing that the IDEAL definition of underflow, a result different from that produced by finite precision and unbounded exponent, had never been implemented in hardware and so perhaps was unlikely to be implemented. Checking inexact instead is a much more practical approximation to loss of accuracy.

We further discussed whether the better definition is to detect underflow on the infinitely precise result before rounding, or on the subnormal result after rounding. The former is easier to state and test but produces underflow signals on what are really unexceptional minimum normalized results; the latter is easier to explain to end users and probably easier to implement in hardware, although there was some uncertainty whether these differences are really significant in the whole scheme of things. In any event the only difference is whether the underflow flag is set or whether the trap is taken for cases near the threshold; no practical application could ever tell which scheme was in effect. A majority favored testing after rounding and so a proposal is published to that effect at http://www.validlab.com/754R/hough-changes-6.html . in the hopes that implementors with substantial reasons for preferring one to another will speak up; proposed text for section 8.4:

Underflow SHALL be signaled by way of the underflow flag, or trapped if the underflow trap has been implemented and enabled, when a result is subnormal and inexact. Thus underflow detection does not depend on the underflow trap enable state; underflow is never signaled for an exact subnormal result; no underflow trap occurs for an exact subnormal result. Trapped underflow on all operations except conversion SHALL deliver to the trap handler the result obtained by multiplying the infinitely precise results by 2^alpha and then rounding. The bias adjust alpha is 3x2^(n-2), where n is the number of bits in the exponent field: thus alpha is 192 for single, 1536 for double, and 24576 for quadruple. Trapped underflows on conversion SHALL be handled analogously to the handling of trapped overflows on conversion.

Correctly-rounded base conversion - proposal by Hough - http://www.validlab.com/754R/hough-changes-4.html .

We agreed that 754's exclusion of conversion between extended format and decimal was a mistake, so that exclusion should be removed from section 1.2.2.

Sign preservation including sign of zero was spelled out in a separate sentence.

We agreed that Fortran print * and the like should produce enough output to distinguish binary numbers, whether by using the magic constants in Table 2 or by using Steele's algorithm for minimal digits as incorporated into Java.

We agreed that getting compilers to interpret manifest constants in programs at run time was a lost cause, but that explicit invocations of conversion functions (like printf/scanf, or Fortran formatted I/O) should always produce the same results as if processed at run time, observing rounding modes and producing exceptions.

My attempt to spell out in detail the implications of various styles of output specifications was not deemed helpful, so I condensed it into fewer, more general, words, at http://www.validlab.com/754R/hough-changes-6.html .

String representations of NaNs and Infinities - proposal by Darcy.

The requirements for conversion of numbers to decimal strings ensure that floating-point data may be output in decimal, then read back in to recover the original data. Should the same be true of Infinities and NaNs? Unlike numbers, NaN data may have different meaning on different systems - should the external format attempt to convey the "meaning" or simply the hex pattern of the fraction bits?

We have agreed that the integer bit differentiates types of NaNs - 1 for quiet, 0 for signaling, corresponding to most current hardware implementations.

We have not yet specified conversions of NaNs between floating-point formats - conversion of a quiet NaN from single to double and back to single probably should preserve the binary pattern, but 754 does not so require, and what about conversion from double to single - are the least or most significant fraction bits truncated? What about sign bits of NaNs - nothing said in 754.

What about input of representations of signaling NaN? Should that signal? Or should the signal be deferred until arithmetic is performed on the NaN - "trivial" operations like copy, abs, negate don't usually signal. Consensus was that reading in a signaling NaN in binary doesn't signal, so neither should it in decimal.

If we have representations like sNaN and qNaN, or NaNs and NaNq, then what does "NaN" mean - quiet or signaling? Majority favored quiet.

Controlling FMA usage - reported by Darcy.

There are three different policies a programmer would like to have with regard to fused mac: fused mac must *not* be used, fused mac *must* be used, and fused mac may be used if it is available and convenient. It seems natural to implement fused mac must be used via a "fmac" or "fma" function call, as is available in C99. Implementing the other two options depends on language semantics. For example, Java forbids substituting fused mac for multiply and add operations. In contrast, in a C or Fortran environment on a platform with fused mac, whether or not fused mac gets used for straight code will probably depend on compiler flags. C99 has pragma's (FP_CONTRACT) to specify whether or not fused macs may be used. When fmacs are being substituted for multiplies and adds, whether

a*b - c*d
is converted to
fmac(a, b, c*d)
or
fmac(c, d, a*b)
is generally left to the compiler's discretion. However, it is feasible that an optimizer could be constrained to group operations in a way predictable from the source.

These fused mac issues are briefly discussed in Borneo 1.0.2 section 6.6.

754 | revision | FAQ | references