[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
http://dvjames.com/esync/dvjTables2006Jul12.pdf
In Table 7.16 convertFromInteger
In Row 5 change:
max > x > min
to:
max > x > 0
(This same change should also be made to row 5 of Table 7.17,
convertFromUnsigned)
Suggest that 'up', 'down', and 'even' be replaced with 'Ru', 'Rd', and
'Re', respectively, where:
Ru = The nearest representable floating-point value larger than x.
Rd = The nearest representable floating-point value less than or equal to
x.
Re = Ru or Rd, whichever is even.
In Tables 7.16, 7.17, 7.20, 7.21, and 7.40, the terms 'MORE', 'HALF', and
'LESS' are not defined and it is confusing to use 'more' and 'MORE' to mean
different things. Suggest the following replacements:
Suggested
Term Replacement
------ -------------
up Ru
down Rd
even Re
TRUE x = Rd
MORE x > (Ru+Rd)/2
HALF x = (Ru+Rd)/2
LESS x < (Ru+Rd)/2
Table 7.40, replace 'SAME' with 'x=Rd' and use 'Re' in the result field in
the column for 'TiesToEven'
I find it confusing to give different meanings to a symbol in different
charts. The following is a list of the symbols used in this note:
Smax: The largest positive signed integer.
Smin: The most negative signed integer.
Umax: The largest unsigned integer.
iNaN: A language-specified signed-integer representation of not-a-number.
iPos: A language-specified signed-integer representation of positive
overflow. iNeg: A
language-specified signed-integer representation of negative overflow.
jNaN: A language-specified unsigned-integer representation of not-a-number.
jPos: A language-specified unsigned-integer representation of positive
overflow. jNeg: A
language-specified unsigned-integer representation of negative overflow.
Note two new concepts: iPos and iNeg (and jPos and jNeg) this permits an
implementation to report positive and negative overflow differently than
iNaN (or jNaN). The rational for this is included in the next few
paragraphs.
The tables for conversion to signed and unsigned integer can be simplified
by first defining t = int(x) and then using t in the table.
This is used in the examples below to simplify the suggested changes.
Let t = int(x), then I believe the most desirable result for clampToSigned
would be: maxNum(Smin,minNum(Smax,t)) and for clampToUnsigned would be:
maxNum(0,minNum(Umax,t)). In such an implementation, iPos=Smax, iNeg=Smin,
jPos=Umax, and jNeg=0. The implementation should be free to choose either
iPos or iNeg for iNaN (and jPos or jNeg for jNaN), or even choose a value
different from either of these.
Converts should deliver the same results as the clamps, the only
difference should be the cases for which invalid operation is reported.
To be consistent with the concept that infinity is the logical
extrapolation for larger and larger values, then for clamping, since values
less than infinity do not set invalid operation, neither should infinity.
Table 7.18 convertToInteger operations
Let t = int(x)
DVJ Suggest
----- -------
t is NaN iNaN* iNaN*$
t = +Inf iNaN* iPos*$
Smax < t < +Inf iNaN* iPos*$
t = Smax Smax Smax
0 < t < Smax t t
t = 0 0 0
Smin < t < 0 t t
t = Smin Smin Smin
-Inf < t < Smin iNaN* iNeg*$
t = -Inf iNaN* iNeg*$
* Invalid operation is raised.
$ Inexact is undefined in this case.
Table 7.21 convertToUnsignedExact operations
Let t = int(x)
DVJ Suggest
----- -------
t is NaN jNaN* jNaN*$
t = +Inf jNaN* jPos*$
Umax < t < +Inf jNaN* jPos*$
t = Umax Umax Umax
0 < t < Umax t t
t = 0 0 0
-Inf < t < 0 jNaN* jNeg*$
t = -Inf jNaN* jNeg*$
Table 7.23 clampToIntegerExact operations (inexact signaling).
Let t = int(x)
DVJ Suggest
----- -------
t is NaN iNaN* iNaN*$
t = +Inf iNaN* Smax
Smax < t < +Inf Smax Smax
t = Smax Smax Smax
0 < t < Smax t t
t = 0 0 0
Smin < t < 0 t t
t = Smin Smin Smin
-Inf < t < Smin Smin Smin
t = -Inf iNaN* Smin
* Invalid operation is raised.
$ Inexact is undefined in this case.
Table 7.25 clampToUnsignedExact operations (inexact signaling).
Let t = int(x)
DVJ Suggest
----- -------
t is NaN jNaN* jNaN*
t = +Inf jNaN* Umax
Umax < t < +Inf Umax Umax
t = Umax Umax Umax
0 < t < Umax t t
t = 0 0 0
-Inf < t < 0 0 0
t = -Inf jNaN* 0
* Invalid operation is raised.
Detailed examination of these figures has revealed another problem.
Assume, for example, the language supports five floating-point and eight
fixed-point formats, as follows:
Type Bits Symbol
---------------------------- ---- ------
Binary Floating Point 32 BFP32
64 BFP64
128 BFP128
Decimal Floating Point 64 DFP64
128 DFP128
Signed Fixed-Point Integer 8 S8
16 S16
32 S32
64 S64
Unsigned Fixed-Point Integer 8 U8
16 U16
32 U32
64 U64
Just for convert float to fixed, there are forty combinations! In many
implementations, the hardware provides conversion from all floating-point
formats to the widest supported signed fixed-point format. Conversion to
shorter fixed-point formats and to all unsigned formats is left to the
software. Since these functions are expected to occur frequently, they are
normally placed in-line and designed to be fast. For example, conversion
from BFP64 to U8 could be done in two steps:
Step 1: BFP64 to S64
Step 2: S64 to U8
(On some machines, each step is a single instruction.)
Most will agree that step 1 is an IEEE operation that must conform to the
standard. However, since this area of 754 is vague, many have assumed that
step 2 is a fixed-point operation and thus under a different set of rules.
(The results of fixed-point operations are undefined in exceptional
situations.)
It should be obvious that the above does not conform to the requirements of
the current draft. It should also be obvious that the performance
implications in changing this code to conform are enormous. Clamping could
increase the execution time by an order of magnitude. Correct handling of
exceptions
could be even worse. It should also be noted that conversions (especially
those for unsigned results) may well set multiple exceptions such as
inexact and invalid operation or overflow and invalid operation. Even
more overhead will be required in the normal path to clean this up. Most
users will prefer fast conversions over conforming conversions.
I, for one, am in favor of relaxing the draft in this area.
Regards, Ron Smith