Minutes from 754R meeting 18 July 2001

David Hough

The seventh meeting of the IEEE 754R revision group was held Wednesday 18 July 2001 at 1:00 pm at Network Appliance, Sunnyvale, Bob Davis chair. Attending were David Bindel, Joe Darcy, Bob Davis, Dick Delp, Eric Feng, David Hough, Rick James, W Kahan, Alex Liu, Peter Markstein, Michael Parks, Jason Riedy, David Scott, Jim Thomas, Dan Zuras.

A mailing list has been established for this work. Send a message "subscribe stds-754" to majordomo@ieee.org to join. Knowledgeable persons with a desire to contribute positively toward a substantially upward-compatible revision are encouraged to subscribe and participate. An official website contains an 8 July draft of 754R in pdf format, requiring a password. Jason Riedy's site contains additional information.

The next meeting is scheduled for Wednesday August 15, for 3:00 to 7:00 at UC Berkeley, in 405 Soda Hall. Dr. David Bailey will speak on high precision arithmetic. Further meeting dates are reserved for September 13, October 18, November 15, December 13, January 14, at Network Appliance, Santa Cruz conference room.

The draft minutes of the previous meeting were approved, with the addition of a mention of the difficulties involved in providing correct exceptions for transcendental functions in the complex domain.

We further agreed that we won't attempt to standardize transcendental functions at all in this draft.

We discussed the current state of the draft and decided that it would be better to start over with the current 754 standard and edit that, in a format that would facilitate working on a variety of standard platforms, and also so that change bars could be useful.

We had previously determined that the essence of 854 could be added as a relatively short appendix, though David James discovered some additional areas that require examination beyond the sections originally identified.

1596 material could be added in an appendix, or left as a separate standard. It's too voluminous to merge with 754/854 - swamps the original material.

Although the final draft will be submitted to IEEE in Framemaker format, a copy of the working draft will be in HTML. It's easy to read the source and easy to convert to plain text. A plain text version of 754 is on the IEEE web site. Jason Riedy will maintain an HTML version.

Underflow: Kahan asked for further rationale for Hough's suggested underflow changes.


C99 specifications and discussion

The bulk of the meeting was a presentation by Jim Thomas of C99 specifications and subsequent discussion. Here are his slides:


IEEE 754 Support in C99
-----
C99 Floating-Point
History
IEEE 754 support a charter focus area for Numerical C
 Extensions Group (NCEG)  1989
Participation and consultation from IEEE 754/854 members
NCEG Technical Report -- 1995
NCEG merged into C9x committee
C99 became standard in 1999
-----
C99 Floating-Point Specification
Organization

Basic FP specification for all implementations, not just IEEE
 common API FP and math library
 complex arithmetic and library
Annex F
 additional specification for IEC 60559 (IEEE 754) implementations
 conditionally normative
Annex G
 specification for complex arithmetic for IEC 60559 implementations
 informative
-----
IEEE 754 Binding

Types
float --> IEEE single
double --> IEEE double
long double --> IEEE double extended
                      else non-IEEE wide type, else IEEE double

Operators and functions
+, , *, and / --> IEEE add, subtract, multiply, and divide
sqrt() --> square root
remainder() --> remainder. remquo() same, with low order quotient bits.
rint() --> rounds FP number to integer value (in the same precision)
nearbyint() --> nearbyinteger in 854 appendix
conversions among floating types  --> IEEE conversions among FP precisions
conversions from integer to floating types -->  conversions from integer to
FP
conversions from floating to integer types --> IEEE-style conversions but
    always round toward zero (inexact exception optional)
lrint() and llrint() -->  rounding mode conversions from floating point to
    long int and long long int
-----
IEEE 754 Binding

Operators and functions (cont.)
translation time conversion of floating constants and strtof(), strtod(),
    strtold(), fprintf(), fscanf(), and related library functions -->
    IEEE binary-decimal conversions. strtold() --> conv function in 854
annex.
 Correctly rounded binary decimal conversion is specified and recommended,
and
 in Annex F it's required.
relational and equality operators --> IEEE comparisons.  Macro
    functions isgreater(), isgreaterequal(), isless(), islessequal(),
    islessgreater(), isunordered() --> "quiet" comparisons.
feclearexcept(), feraiseexcept(), fetestexcept() -->  test and
    alter IEEE exception status flags.  fegetexceptflag() and
fesetexceptflag()
    -->  save and restore all status flags (including any auxiliary state).
    Use with type fexcept_t and macros FE_INEXACT, FE_DIVBYZERO,
    FE_UNDERFLOW,  FE_OVERFLOW, FE_INVALID
fegetround() and fesetround() --> select  among  IEEE rounding modes
    Macros FE_TONEAREST,  FE_UPWARD, FE_DOWNWARD,
    FE_TOWARDZERO --> IEEE rounding modes.  Values 0, 1, 2, 3
    of FLT_ROUNDS --> IEEE rounding modes
-----
IEEE 754 Binding

Operators and functions (cont.)
fegetenv(),  feholdexcept(), fesetenv(), and feupdateenv() --> manage
    status flags and control modes, facilitate hiding exceptions
FENV_ACCESS ON pragma, with file or block effect, required for well defined
 behavior of code that reads flags or runs under non default modes
copysign() --> copysign in 754 appendix
unary minus () --> minus () operation in 754 appendix
scalbn() and scalbln() --> scalb in 754 appendix (scalbln() has long int
 second parameter
logb() --> logb in 854 appendix. ilogb() like logb() except returns type int
nextafter() and nexttoward() --> nextafter in appendix, except returns y
 if x = y. nexttoward() doesn't clamp a wide direction argument.
isfinite() macro --> finite function in Appendix. Inquiry macros are type
 generic
isnan() macro --> isnan function in Appendix
signbit() and fpclassify() macros with number  classification
 macros FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL,
 FP_ZERO --> class  function in Appendix (except for signaling NaNs)
-----
IEEE 754 Binding

Special values
INFINITY and NAN macros --> +infinity and a quiet NaN. Usable for
 static and aggregate initialization
sign respected for zero and infinity
I/O supports inf, infinity, nan, nan(n-char-sequence). Interpretation
 of n-char-sequence is implementation defined. Input case insensitive. User
 choice of upper or lower case (INF or inf) for output.
nan() takes "n-char-sequence" argument and constructs NaN at runtime

Not supported (though not disallowed)
trap handling (except with SIGFPE)
signaling NaNs
NaN significands (except for n-char-sequences)
compile-time mode and flag access

tried to specify trap handling and signaling NaNs, but found insufficient
 inspiration, prior art and use, IEEE 754 guidance
-----
IEEE 754 Related Specification

Expression evaluation
Elementary functions
Complex arithmetic
-----
IEEE 754 Related Features
Expression Evaluation


File or block pragma FP_CONTRACT allows or disallows contraction
 optimizations (e.g. fused multiply add synthesis). FP_CONTRACT ON
 can be default.
Other value changing optimizations disallowed
fma() guarantees fused multiply add
3 well-defined expression evaluation methods
 evaluate each operation and floating constant to semantic type
 widen float operations and floating constants to double
 widen float and double operations and floating constants to long double
Evaluation type may be wider than semantic type
 wide evaluation does not widen semantic type
Assignments, casts, and argument passing convert to semantic type
FP_EVAL_METHOD macro identifies method in effect
Implementation may provide any, all, or none of these methods
"Widest-need" evaluation allowed but not specified. Specified in NCEG
 Technical Report but not in C99 because of lack of prior art.
-----
IEEE 754 Related Features
Wide Evaluation


Types float_t and double_t match evaluation types for float and double
Inclusion of  makes type of math function be determined by
 its argument
Wide evaluation example
  #include 
  float x, y, z;
  float_t ss;
  ss = x*x + y*y;
  z = sqrt(ss);
 computes sqrt(x2+y2) entirely in the evaluation type and converts
 to float only in the last assignment
C99-portable code
 uses wide evaluation if available
-----
IEEE 754 Related Features
Elementary Functions

Three fully supported real types:  float <= double <= long double
C89 math library
acos asin atan atan2 cos sin tan cosh sinh tanh exp frexp ldexp log
log10 modf pow sqrt ceil fabs floor fmod

C99 math additions
erf erfc lgamma tgamma hypot acosh asinh atanh cbrt expm1 ilogb log1p
logb nextafter remainder rint isnan isinf signbit isfinite isnormal
fpclassify isunordered isgreater isgreaterequal isless islessequal
islessgreater copysign log2 exp2 fdim fmax fmin nan scalbn scalbln
nearbyint round trunc remquo lrint lround llrint llround fma nexttoward

C99 floating-point environment library
feclearexcept fegetexceptflag feraiseexcept fesetexceptflag fetestexcept
fegetround fesetround fegetenv feholdexcept fesetenv feupdateenv
-----
IEEE 754 Related Features
Elementary Functions  Special Cases

(math_errhandling & MATH_ERREXCEPT) tests for 754-style exception flags
  required by Annex F
(math_errhandling  &  MATH_ERRNO) tests for errno support
Annex F has detailed specification of special cases for real functions
 IEEE 754 meaning of exceptions
 exceptions required only under FENV_ACCESS ON
 functions that are essentially always inexact are not required to raise
inexact
 functions may raise inexact if result is exact (implementation defined)
 functions may raise underflow if result is tiny and exact (implementation
defined)
 functions may or may not honor rounding directions (implementation defined)
 specifies numeric result instead of NaN if numeric result is useful in some
significant
 applications -- CONTENTIOUS
-----
IEEE 754 Related Features
 Elementary Functions  Special Cases

Contentious special cases:
atan2(+-0, 0) = +-pi,  atan2(+-0, +0) = +-0
atan2(+-inf, inf) = +-3pi/4,  atan2(+-inf, +inf) returns +-pi/4
hypot(+-inf, y) = +inf, even if y is a NaN
pow(1, +-inf) = 1
pow(+1, x) = 1 for any x, even a NaN
pow(x, +-0) = 1 for any x, even a NaN
pow(inf, y) = +0 for y<0 and not an odd integer
pow(inf, y) = inf for y an odd integer > 0
pow(inf,  y)  =  +inf  for  y>0  and  not an odd integer
if  just  one  argument  is  a NaN, the fmax and fmin functions return the
other
 argument
-----
IEEE 754 Related Features
Complex Arithmetic

Three complex types:  float complex, double complex, long double complex
Annex G specifies three imaginary types: float, double, and long double
imaginary
Operands not promoted to a common type domain (real, imaginary, complex)
 e.g. r(u + vi) = ru + rvi, not (r + 0i)(u + vi)
 provides natural efficiency and better treatment of special values
 e.g. i i = , not ( + 0i)(0 + i)( + 0i)(0 + i) = NaN + NaNi
Infinity properties
for z nonzero and finite
inf*z=inf    inf*inf=inf inf/z=inf    inf/0=inf
  z/inf=0      0/inf=0     z/0=inf      |inf|=inf
even for complex and imaginary z, 0s, and infinities
a complex value with at least one infinite part is regarded as infinite (even
if
 the other part is NaN)
CX_LIMITED_RANGE ON (file or block) pragma allows implementation to
 deploy simpler code and forgo infinity properties
-----
IEEE 754 Related Features
Complex Arithmetic, Functions

Sample implementation of multiply and divide use just one isnan test to
condition
 special case code
Multiply and divide must raise deserved exceptions and may raise spurious
 ones
Imaginary unit I
 float imaginary constant
 x + y*I, where x, y are of same real type, requires no actual FP ops
Complex library
cacos casin catan ccos csin ctan cacosh casinh catanh ccosh csinh ctanh
cexp clog csqrt cabs cpow carg conj cimag cproj creal

Inclusion of  makes math functions generic for real, complex, and
imaginary
exp(z) = cexpf(z), if z is float complex
sin(y*I) = sinh(y)*I, if y is double
cos(y*I) = coshl(y), if y is long double
-----
IEEE 754 Related Features
Complex Functions  Special Cases

Annex G specifies non NaN results for special cases where useful for
preserving  magnitude or
direction information -- CONTENTIOUS
cexp(-inf+iNaN) = +-0+-i0 (where the signs of the real and imaginary parts of

 the  result are unspecified)
csqrt(x+iinf) returns +inf+iinf, for all  x  (including NaN)
cacos(+inf+iinf) returns pi/4-iinf
creal(x, iNaN) = x
-----
C99 Support for IEEE 754
Reception

C99 represents a 10 year, good faith effort by a language standard group,
with lots of
 help from the numerical community, to support IEEE 754
Being picked up by next Unix standard
Impact on next C++ standard TBD
Several vendors have implemented, or are implementing, all or part
HP-UX C for Itanium has essentially all of C99 FP
Careful, useful (reasonable performance) implementation requires great
attention to detail,
 beyond what can be expected of compiler teams
ROI seen as greater for performance work
Customer demand (for features beyond basics) seen as low, customer
appreciation TBD
Needs support from numerical community
 affirmation of value
 demonstration of utility

Prospective implementers would like to see customer demand to justify the investment. Kahan: this is a symptom of a widespread problem. IMSL has been absorbed, NAG faces bankruptcy. Nobody pays for good stuff when you can download something from the web. Nobody studies floating-point arithmetic any more. Ultimately some group like 754R will have to cater to clueless Visual Basic programmers by providing arbitrary precision effortlessly. Something more like Maple or Mathematica will ultimately be required.

R James: what should 754R do for the next rev of C++? Kahan: explain expression evaluation in terms of desired behavior rather than with the "semantic type" (float_t) mechanisms; according to Darcy the latter was due to C's lack of overloading. Kahan: the default expression evaluation had better be maximum accuracy; almost nobody is left who can debug rounding errors.

Nextafter/nexttoward - should the second argument be generic? If the arguments are equal, which should be returned? [Hough: nextafter/nexttoward are mistakes - since almost all examples except nonlinear equation solving, the second argument is a constant. Instead the function should have been Nextup(x) which returns the next machine-representable number toward +infinity, and from which a nextafter-like function can be easily coded. Type is important here, because unlike arithmetic functions, the exact answer depends on the type of the argument. So in the presence of multiple possible expression evaluation rules, either this function is not generic or its argument is limited to typed variables or constants, not expressions.]

Expression evaluation is complicated in cases like HP compilers which supported both a long double = x86 extended and a 128-bit quad format.


Decisions Reached Thus Far on technical content

We need a summary of all "final" decisions reached so far, as recorded in the minutes, in order to update the draft proposal. Here it is:

754 | revision | FAQ | references