Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Motion: Number format



Note: Here, IEEE 754 is IEEE 754-2008 (currently the latest version
of the IEEE 754 standard). Since not everyone can read Unicode symbols,
the infinity symbol is denoted oo (not inf, because inf is also an
operation defined by P1788).

Motion
======

An argument or a result of some operations covered by the standard
can be a number. A number is defined at Level 1 as being any member
of the set Rbar = R U { -oo, +oo } of extended reals. This motion
defines the corresponding notion at Level 2, called "number format"
and denoted here F. For an inf-sup interval type, the bounds are
the members of a number format F. An implementation must support
at least one number format F. The requirements and recommendations
are:

A1. F corresponds to a finite subset of R together with 3 special
datums: -oo, +oo (the infinities of Rbar) and NaN (not-a-number).

A2. A datum can have different variants or different representations
(e.g. IEEE 754 qNaN / sNaN, cohorts for decimal floating-point
numbers, signed zeros), but they are not distinguished in F, possibly
except for the number 0, which may be regarded as signed by the P1788
implementation; in such a case, it has exactly two possible values,
denoted -0 and +0. When 0 is unsigned, the notation -0 or +0 can be
used and simply denotes 0.

A3. The number 0 (either a single element 0 or the two elements -0
and +0) must be in F.

A4. The format must be symmetric, i.e. if a real x is in F, then -x
is also in F.

A5. An element of Rbar (Level 1) is mapped (rounded) to an element
of F (Level 2) according to the following rules:
  * A constraint can be given on the rounding direction. It must
    be satisfied.
  * Following this constraint, the rounded result should be an
    element of F that is the closest to the exact result (with
    a special implementation-defined rule for the distance to
    an infinity).
  * In general, this element is unique, but there may exist two
    elements satisfying this property (halfway cases). How these
    cases are rounded should be documented by the implementation.

The notion of a 754-conforming implementation has been introduced
in Motion 6.04. A 754 format must have the following additional
requirements:

B1. The infinities correspond to the IEEE 754 infinities.

B2. NaN corresponds to the IEEE 754 Level 2 NaN. At IEEE 754 Level 3,
it can be either qNaN or sNaN, but if a NaN is produced as a result,
it must be a qNaN (except documented otherwise by P1788).

B3. The number 0 is signed.

B4. Concerning A5, for each operation where an interval of type T is
involved, there must be at least one number format F (the one of the
interval bounds for inf-sup, or documented by the implementation for
implicit interval types) such that the rounding must be done according
to the rules given by the IEEE 754 standard (correct rounding).
In the case of rounding-to-nearest (no constraints on the direction),
which rounding function (among the two possible rounding-direction
attributes) is used must be defined by the implementation.

The notion of number format F compatible with an interval type T can
be defined, if need be. The idea is to require F to be dense enough
in order to satisfy some properties when returning a numeric value.
Later motions can require that for some specific operations, F must
be compatible with T (but shouldn't if this isn't really necessary).
A number format F is said to be compatible with an interval type T
if:

C1. For each non-empty interval I of type T (Level 2), there exists
a finite number x of F such that x \in I.

Later motions may define other constraints (when need be).

Rationale
=========

The above constraints have been chosen not to be too strict in order
to allow conventional formats, such as floating-point numbers (with
or without subnormals), fixed-point numbers (including integers),
rational numbers (with bounded numerator and denominator),
double-double numbers (provided by the "long double" C type in the
current PowerPC ABI).

A1. We require the subset to be finite is order to be able to specify
the rounding without ambiguity. A discrete subset would be sufficient
for this purpose, but P1788 will probably also need the notion of
smallest and largest finite element (see the discussions about the
midpoint). Arbitrary precision formats need to be parameterized (for
instance, by the precision and possibly the exponent size), so that
each format instance is finite.

A2. Ideally, one should not need to distinguish -0 and +0 as far
as P1788 is concerned, but this is done mainly for 754-conforming
implementations.

A3. The number 0 will probably be needed for some functions. And it
would be awkward not to require it anyway.

A4. Symmetry may not really be necessary, but it may expected by the
user, and all known formats are symmetric (well, in general, not the
native C integer types, but this is the cause of various bugs and
they don't have infinities and NaN anyway).

A5. Correct rounding is recommended, but not required because it may
be difficult or may not really be useful in exotic number formats.
For 754-conforming implementations, it is however required for some
number format associated with the interval type T, but not necessarily
for all supported number formats, so that we do not disallow things
like mixed radix.

C1. A consequence: one can define a Level 2 midpoint that belongs to
any non-empty interval of type T.

Note: The standard doesn't define operations on the number format
itself (e.g. the additions of two numbers). That's why the motion
doesn't contain any requirement concerning such operations.

-- 
Vincent Lefèvre <vincent@xxxxxxxxxx> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)