[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

My P754-150 comments



Dear 754 people,

For reasons out of my control, I was quite late in writing up my
comments on the standard. I did submit them yesterday but I can still
change a few things till the deadline tomorrow. This is why I am
sending this email with all my comments and asking for anyone who is
free to ponder on some of them and correct me if I am really off the
right track.

In particular, I am concerned about three comments.

1. Given the recent discussion on detection of underflow, here is
another question.

page 48, subclause 7.5, lines 20 and 21:
change
"If the rounded result is exact, no flag is raised and no Inexact
exception is signaled."
to
"If the rounded result is exact, the underflow flag is not raised and
the inexact exception is
not signaled."
which clarifies the meaning of the text.

Does that part mean that the underflow exception is signaled (as
stated in the first line of 7.5: the underflow exception shall be
signaled when a tiny non-zero result is detected) but the flag is not
raised (as stated here)? If yes then how
is it "signaled" in the default exception handling? Is that part in
conflict with subclause 7.1,
line 7 on page 46, that says "the implementation shall provide a
status flag that shall be
raised when the corresponding exception is signaled."?

I feel that either I am confusing myself or the text needs to be a bit
more precise in its various parts.


2. The second concern that I have is about our dear pow(x,y)

page 55, line 2 (stating that pow(x,y) is invalid for negative x and
non-integer y)

I disagree with this line.
As an example of the contrary if we take x=-32= -(2^5) and we take y=0.2=1/5
then x^y is defined in mathematics to give (-32)^{0.2}=-2.
Hence, pow(-32, 0.2) should be -2. It should not give qNaN and should
not signal invalid.

The line as it stands now applies for things such as pow(-32,0.5) or
as far as I can tell for
all the binary formats since a non integer y in binary means that it
is a rational number with a
denominator that is a power of two. Effectively then we are trying to
get an even root of x and
this is not defined for negative values of x.

However, the line is false for the decimal format. Since a non integer
y in decimal is a
rational number with a denominator that is a power of ten. If the
numerator of y has enough
powers of two to cancel those in the denominator then y is effectively
an odd root. Such a root
is defined in mathematics for negative values of x.

Note that the start of clause 9 on page 51 states that the functions
are recommended for all
the supported formats, i.e., including decimal ones. Hence, our
definition must be correct for
decimal as well as binary.


3. The third concern I have is about compound(x,n)

page 53, Table 13, entry for compound(x,n)
why is the domain restricted to [-1,+\infty]\times Z and it is not
[-\infty, +\infty]\times Z as in the pown(x,n) function?

I see that
compound(x,n) = pown((1+x),n)
if the right hand side can take any x so should the left hand side.

I suggest to change the domain of compound to match that of pown and
to remove the
invalid signal for x<-1.


In addition to those three main issues, I also suggest that we omit
the specific description of the hexadecimal representation grammar and
that we clearly define the words "identical" and "literal meaning".
For the reasons, see the comments below.

All my comments follow
--------
Comments on the draft revision of IEEE754 standard version 1.50

Technical:

1. page 8, line 24,
change "floating-point numbers" to "floating-point data" since the
standard describes the handling of NaNs which are not numbers as
indicated in the glossary.



2. page 8, line 26,
change "will be identical" to "will be identical at the representational level"
The reason is that a function with two inputs equal to qNaNs provides
one of them to the output but the standard does not specify which one
as defined in subclause 6.2.3 (page 45, lines 14 and 15).

According to subclause 6.2.3, an implementation may use a rule saying
that the left operand is
produced yielding  f(qNaN1, qNaN2)=qNaN1  while another implementation
produces the right
operand yielding  f(qNaN1, qNaN2)=qNaN2.

At the representational level (level 3 of Table 1 on page 15), both
qNaN1 and qNaN2 are the
same, i.e., identical to just a qNaN. They are not identical at the
encoding level (level 4
of Table 1).




3. The same issue regarding the presence of two NaN inputs affects the
definition of "literal
meaning" of subclause 10.4. Addition and multiplicaiton cannot commute
the order of the
operands in a single implementation if the rule in that implementation
is to use the left
(similarly for the right) operand always.

add(qNaN1, qNaN2)=qNaN1
but
add(qNaN2, qNaN1)=qNaN2


Shall we define "literal meaning" to be at the representational level?
This allows us to avoid
the two NaNs problem. However, the representational level may mean
that producing two different
members of the same cohort in decimal representations breaks the
"literal meaning". If we move
"literal meaning" to be at the floating-point data level (level 2 of
Table 1) we will lose the
distinction between qNaN and sNaN.


I suggest that we define "literal meaning" to be at the
representational level with an explicit
mention that members of the same cohort are acceptable.




4. Include an entry in the glossary for "identical" as defined in
subclause 1.4 and for "literal meaning"
meaning" as defined in subclause 10.4.


5. subclause 2.2.36, definition of normal number:
This is the first use of the symbol b. We should define it here as the
radix similar to what is
used when defining the symbol e, q, and p in the entry for the
exponent (2.2.17).



6. subclause 2.2.39, definition of precision:
Add "This standard uses the symbol p for the precision." This is the
logical place to define p,
currently it is defined in subclause 2.2.17 but not in subclause 2.2.39.


7. subclause 2.2.43, definition of radix:
Add "This standard uses the symbol b for the radix." This is the
logical place to define b.


8. subclause 2.2.46, definition of significand:
The current definition is not complete. It only explains the
significand of finite numbers.
Add "It is also a component of the encoding of infinities and NaNs as
explained in clause 3."


9. After subclause 2.3 but before clause 3 (i.e., on page 13):
Add a subclause 2.4 for symbols where we should put the definitions of
the following symbols.
b, p, emin, emax, q, c, s, T, G, m, r, v, J, Q(x), H, P1, and P2.


10. page 15, line 26,
Change
"p=the number of significant digits (precision)"
to
"p=the number of significand digits (precision)"
where the last letter in the word significand should be d not t. This
change conforms to the
definition in subclause 2.2.17 (and subclause 2.2.39, given comment 6 above).


11. page 16, line 23,
change "reduced precision" to "reduced number of significance digits".
According to subclause
2.2.39 the
precision is fixed for a specific format and is not variable. On the
other hand, the number of
significance digits is variable in the case of subnormal numbers.


12. page 17, first paragraph:
This paragraph uses the word normalization. It should be defined here
before its usage.
That definition should be also mentioned in the glossary. Moreover,
the description of
normalization presented in the paragraph is incomplete. I suggest to change

"Each floating-point number has just one encoding in a binary
interchange format. To make the
encoding unique, in terms of parameters in 3.3, the value of the
significand m is maximized by
decreasing e until either e=emin or m>=1. After this normalization
process is done, if e=emin
and 0<m<1, the floating-point number is subnormal. Subnormal numbers
(and zero) are encoded with
a reserved biased exponent value."

to be

"Each floating-point number has just one encoding in a binary
interchange format. To make the
encoding unique, in terms of parameters in subclause 3.3, if the value
of the significand m<1,
this value is maximized by
decreasing e until either e=emin or 1<=m<b. On the other hand, if
m>=b, its value is minimized
by increasing e until either e=emax or 1<=m<b. This process of
adjusting the value of m and e
to make the encoding unique is called normalization. After the
normalization is done, if e=emin
and 0<m<1, the floating-point number is subnormal. Subnormal numbers
(and zero) are encoded with
a reserved biased exponent value."


13. pages 17, 18, and 19
change all the instances using a capital S for the sign bit to a small
s as defined on page 16.



14. page 19, line 13
change the three instances using a capital C to a small c as defined on page 16.



15. page 22, Table 8, entry for the precision in binary formats
Due to the chosen values of k as a multiple of 32, log2 (k) is always
an integer and hence
there is no need to use the int function nor to define it below the
table. Just use
  k - 4\times log2 (k) + 13



16. page 28, line 40:
change "with infinite range" to "with infinite precision" given the
definition of
precision in subclause 2.2.39. (range is not defined at all in the standard.)


17. page 29, line 32:
Add "However, an invalid exception can arise due to the
multiplication." This is just an
additional clarification.

18. page 36, line 24:
specify explicitly the result and the signals raised for a qNaN.
I assume it to result in the same qNaN and no signals at all.

19. Page 43, subclause 5.12.3:
Clarify that this is not a true hexadecimal format with a radix equal
to 16 but rather a binary
format represented using hexadecimal digits for the significand and
exponent fields to simplify
their display. Hence the hexsignificand is multiplied by two (not 16)
raised to the power of the
hexexponent.

I prefer that we do not specify the grammar so rigidly and just leave
it to the language
standards. We should provide just a general description as done for
the decimal character
sequence. The chosen symbols may restrict what languages are currently
using or will want to
use in the future.

To illustrate the point here are a few examples.
- A programming language may have another use for 0x and p.
- A programming language designed by a European or an Arab may use ","
instead of "." for the
fractional separator.
- A programming language, or even just a program in any general
language, designed by an Arab
may use the indic (also called eastern) arabic numerals instead of the
western arabic numerals.
(The western arabic numerals are the ones used with the Latin script
as well as with the Arabic
script in the western part of the Arab world, i.e., Libya and all
countries to its west.
The eastern arabic numerals are the ones used in the eastern part of
the Arab world,
i.e., Egypt and all countries to its east.)
The different numerals have different unicode points. Most software
applications dealing with
the Arabic script (second most widely used script after latin and used
for over fifteen
languages besides Arabic) support both numerals and accept either code
points from unicode when
getting data from users or providing them with output.

I strongly suggest that we get ourselves out of this problem by
staying away from the detailed
description of the external representation.



20. page 48, subclause 7.5, lines 20 and 21:
change
"If the rounded result is exact, no flag is raised and no Inexact
exception is signaled."
to
"If the rounded result is exact, the underflow flag is not raised and
the inexact exception is
not signaled."
which clarifies the meaning of the text.

Does that part mean that the exception is signaled but the flag is not
raised? If yes then how
is it "signaled" in the default exception handling? Is that part in
conflict with subclause 7.1,
line 7 on page 46, that says "the implementation shall provide a
status flag that shall be
raised when the corresponding exception is signaled."?


21. page 49, line 9,
change "(e.g., divideByZero)" to "(e.g., divideByZero as in div(0,0))"


22. page 51, line 32,
change
"It is the sign of its residue times the sign of zero ..."
to
"It is exclusive or of the sign of its residue and the sign of zero ..."

(times may be misinterpreted to mean the binary AND operation between
the two sign bits.)


23. page 53, Table 13, entry for compound(x,n)
why is the domain restricted to [-1,+\infty]\times Z and it is not
[-\infty, +\infty]\times Z as in the pown(x,n) function?

I see that
compound(x,n) = pown((1+x),n)
if the right hand side can take any x so should the left hand side.

I suggest to change the domain of compound to match that of pown and
to remove the
invalid signal for x<-1.



24. page 54, line 16
change
"(even when x=-1 or quiet NaN)"
to
"(even when x=-1, quiet NaN, or infinity)"
in those three locations.

25. page 54, line 18
Add the case for odd n>0
I suggest
compound(-1,n)= +- 0 according to the rounding direction for odd n>0
(-0 for roundTowardNegative and +0 otherwise)

26. page 54, lines 24 and 29
change
"(even a zero or quiet NaN)"
to
"(even a zero, a quiet NaN, or an infinity)"
in those two locations.


27. page 54, line 27
Add the case for odd n>0
I suggest
pown(+- 0,n)= +- 0  for odd n>0


28. page 54, line 32
Add the case for y an odd integer >0
I suggest
pow(+- 0,y)= +- 0  for y an odd integer >0


29. page 55, line 2
I disagree with this line.
As an example of the contrary if we take x=-32= -(2^5) and we take y=0.2=1/5
then x^y is defined in mathematics to give (-32)^{0.2}=-2.
Hence, pow(-32, 0.2) should be -2. It should not give qNaN and should
not signal invalid.

The line as it stands now applies for things such as pow(-32,0.5) or
as far as I can tell for
all the binary formats since a non integer y in binary means that it
is a rational number with a
denominator that is a power of two. Effectively then we are trying to
get an even root of x and
this is not defined for negative values of x.

However, the line is false for the decimal format. Since a non integer
y in decimal is a
rational number with a denominator that is a power of ten. If the
numerator of y has enough
powers of two to cancel those in the denominator then y is effectively
an odd root. Such a root
is defined in mathematics for negative values of x.

Note that the start of clause 9 on page 51 states that the functions
are recommended for all
the supported formats, i.e., including decimal ones. Hence, our
definition must be correct for
decimal as well as binary.






Editorial:

1. This is a general comment that should be followed in all the
enumerations and itemizations.

When a sentence is broken on multiple lines for itemization the
punctuation should be
preserved. As an example, on line 31 of page 2 the itemization is
"Provide direct support for
 - execution-time diagnosis of anomalies
 - smoother handling of exceptions
 - interval arithmetic at a reasonable cost." This itemization should
use commas between the
items and the word "and" before the last one to become
"Provide direct support for
 - execution-time diagnosis of anomalies,
 - smoother handling of exceptions,
and
 - interval arithmetic at a reasonable cost."
It is important to note that this whole example is in itself a single
item (item d) of a larger
enumeration. It might be useful to reword the sentence that starts
this enumeration on line 22
to become "The following points are among the desiderata that guided
the formulation of this
standard." Note that it is a complete sentence that ends with a
period. Each point in the
enumeration should now be rewritten as a full sentence in its own
right. If it includes
subitems (as in case d mentioned earlier) then the items should be
punctuated correctly.




2. This is another general comment regarding references to clauses and
subclauses in the
standard.

The references are not consistent. For example, on line 5 of page 40, we have
"described in 5.4.2, subject to limits stated in clause 5.12.2 below."
This should be
"described in subclause 5.4.2, subject to limits stated in subclause
5.12.2 below."
Note that I am using subclause for 5.12.2 and not clause. If there is
a preference to use
subsubclause then it would be more accurate but cumbersome. I suggest
that we maintain the word
clause for the major number (1, 2, 3, 4, 5, ...) and use subclause for
anything under a clause.

For consistency I recommend that we go through the whole document and
wherever there is a
reference use the appropriate word either clause or subclause.




3. Page 12, line 5, two periods end the sentence.


4. page 17, line 23, change (-1)^s \times +\infty to (-1)^s \times
(+\infty) to avoid the
presence of two consecutive operators. Similarly for
page 17, line 30;
page 19, line 11;
 and
page 19, line 26.


5. page 24, the line numbers on the side of the page are wrong.


6. page 42, line 9, remove the overstrike from the "an" at the end of the line.

7. page 44, line 27, remove the last ) on the line and use "." instead.

8. page 48, line 16, change the first word "Thee" to be "The".

9. Page 50, line 15, two periods end the sentence.

10. page 52, line 16 and line 20,
in the various limits make sure that the sign after the zero is a
superscript to the zero.
It currently appears as a unary operator giving the sign of the function.

11. page 52, line 27,
change "as for, for example, sin(x)" to "as, for example, sin(x)"
(i.e., omit the extra for).

12. page 57, line 37,
there is a missing period after the word "format" and before "The latter".

13. page 60, line 45,
there is an extra ")" at the end of the sentence.
------------

If you have read so far, I must thank you :)


754 | revision | FAQ | references | list archive