[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: round32 ( round64 ( X ) ) ?= round32 ( X )

There is a case where it works like you expect:

round_32( basic_op_64( convert_32_to_64(x_32), convert_32_to_64(y_32) ) ) 
does equal

IBM used this in the initial RS6000.

-----Original Message-----
From: stds-754@xxxxxxxx [mailto:stds-754@xxxxxxxx] On Behalf Of Golliver, Roger 
Sent: Thursday, March 31, 2011 4:01 PM
To: Peter Lawrence; STDS-754@xxxxxxxxxxxxxxxxx
Subject: RE: round32 ( round64 ( X ) ) ?= round32 ( X )


It is known, but not well known.
The Java Language Designers didn't know and didn't talk to their own in house 
numerics experts, like David Hough.

See above link for some suggestions for how to avoid the double rounding error.

-----Original Message-----
From: stds-754@xxxxxxxx [mailto:stds-754@xxxxxxxx] On Behalf Of Peter Lawrence
Sent: Thursday, March 31, 2011 12:55 PM
To: STDS-754@xxxxxxxxxxxxxxxxx
Subject: round32 ( round64 ( X ) ) ?= round32 ( X )

      my apologies in advance if this is trivial and/or non-sense, but I did 
not find the answer in a quick scan of David Goldberg's "What every computer 
scientist should know about floating point arithmetic", nor in other more 
specifically IEEE-754 documents that I have.

consider the effect of first rounding (round-to-nearest-even) to some number of 
bits, followed by another rounding to a smaller number of bits, the question is 
is that always the same as directly rounding to the smaller number of bits.

is the following observation mathematically (round-to-nearest-even)

        the commas are for readability, the semicolons indicate where rounding 
is to take place:

         1.aaaa0,10000;0xxx ==> 1.aaaa0,10000    1.aaaa0;10000 ==>  
1.aaaa0    round to 10 bits, followed by round to 5
         1.aaaa0;10000,0xxx ==>                                ==>  
1.aaaa1    directly round to 5 bits

(the "0xxx", and "0000,0xxx" are some of the bits of some mathematically exact 
result which are not all zeros, which would be represented by a non-zero 
"sticky bit" in an actual hardware implementation.  In the first case the 
sticky bit gets truncated, in the second case the sticky bit causes a round 

if the above is a correct observation, then

        round32 ( round64 ( X ) )   is not always equal to   round32 ( X )

which seems sort of counter-intuitive, at least I started out thinking it would 
always be, but thought I had better prove it first, and then came up with this 
counter example. If it is true, I wonder if it is well known or not.

Peter Lawrence.

754 | revision | FAQ | references | list archive