Re: Motion P1788/0024.01: Rounding Mode as Operation
I like the idea of being able to specify the rounding mode in each instruction, as long as one of the options is to use a dynamic rounding mode, but it isn't practical to require it. In addition to Hossam Fahmy's list, processors with vector registers (eg, IBM's Power7) would need to specify rounding mode bits in every vector floating point instruction. This all becomes impractical to implement, because on some architectures there are not enough opcode/subopcode bits to add that many combinations.
I understand the desire for good Interval Arithmetic performance, but performance is up to the implementers. If IA is important to them, they will do what's practical to make it fast, balancing that decision with all their other requirements and desirables. If it is not important, they will not. If implementers have other ways to either switch rounding modes quickly or to reduce the need to switch (eg, through smarter compilers) then they may do that instead.
In addition to Michel Hack's two examples, a few PowerPC instructions include bits to set the rounding mode, with one of the options being the dynamic rounding mode, and a few instructions have hardwired rounding towards zero or in some other direction. These are chosen based on usefulness. On z/Architecture, Michel mentioned DFP but there are some BFP instructions that can set the rounding mode.
We should not require that specific assembler instruction names be used, any more than we should require that all people must think in one specific language, regardless of the language(s) they speak. Very few programmers ever see let alone write assembler, and as the abstraction level is raised by new language features like Interval Arithmetic, fewer and fewer will or should. It is not just that we should not require it, it's that there's no need to.
- Ian McIntosh IBM Canada Lab Compiler Back End Support and Development
"Hossam A. H. Fahmy" ---04/18/2011 01:50:21 PM---Dear 1788 members, The proposal of Prof. Kulisch has two points:

From: | 
"Hossam A. H. Fahmy" <hfahmy@xxxxxxxxxxxxxxxxxxxxxxx> |

To: | 
Ian McIntosh/Toronto/IBM@IBMCA |

Date: | 
04/18/2011 01:50 PM |

Subject: | 
Re: Motion P1788/0024.01: Rounding Mode as Operation -- discussion period begins |
Dear 1788 members,
The proposal of Prof. Kulisch has two points:
1. "that every future processor shall provide the 16 operations"
2. "dictate names for the corresponding assembler instructions "
As someone teaching about processor designs and specifically computer arithmetic, I see that the first point is much easier to standardize (but with different wording, see below) than the second. The first point is the more important to the interval community and if present it will enable compiler writers to generate efficient code to the target processor.
We cannot dictate instructions names on so many different implementations in the present and the future. The nomenclature of the various instruction sets of processors is quite different and we will be practically asking the HW vendors to avoid supporting 1788 if we ask for unnecessary requirements. So, I propose that we only stick to the first point.
The wording I propose for the first point is:
"every 1788 compliant system shall provide the ...."
I say "1788 compliant" because we obviously cannot control processors that choose to stay away from 1788 compliance.
I say "system" because we chose 1788 to be on top of 754. The latter speaks about a complete system that might be either SW only, or a combination of SW and HW. I understand that a SW only system will be slow but shall we deny compliance from such an implementation? What if a programmer wants to provide the functionality of 1788 on a current processor family that does not have the required HW support, shall we tell such a person: "No your implementation is not conformant because it does not have HW"?
If we want to encourage the HW implmentation we can say
"and should be implemented directly in hardware for efficiency reasons"
but I do not recommend that we demand that the implementation must be in hardware.
The last issue I would like to bring in this email is the specification of only 16 operations.
Prof. Kulisch specifies
+ - * / Round to Nearest ties to Even (RNE)
+> -> *> /> Round toward Plus Infinitiy (RPI)
+< -< *< /< Round toward Minus Infinity (RMI)
+| -| *| /| Round toward zero (RTZ)
This seems to imply only a binary floating point implementation. To accomodate decimal floating point implementations, the Round to Nearest ties Away from Zero must also be supplied (this rounding direction is a requirement for decimal and is optional for binary in 754). So, we should have at least an additional row of four more operations with this fifth rounding direction to get 20 operations. Moreover, if implemented in real HW the decimal operations will be distinct from the binary operations, that makes a total of 40. Furthermore, most probably each operand size (binary32, binary64, binary128, decimal64, decimal128) also has a distinct set of operations in real HW so if 1788 wants to explicitly specify them all, it becomes 20*5=100 operations at least. I think that we cannot practially list all these *explicitly* in the body of 1788. We may need to think of a way to present the idea in less space.
Dan, Michel, and other folks from 754 please correct me on the above if I erred.
Another wrinkle that I would like to add is what we discussed several times in the 754 list regarding dynamic rounding (the most recent being on 25 May 2010 as far as I can tell). Would it be better to specify a row with "an inherited" rounding as follows?
+e -e *e /e Round to Nearest ties to Even (RNE)
+a -a *a /a Round to Nearest ties to Away (RNA)
+> -> *> /> Round toward Plus Infinitiy (RPI)
+< -< *< /< Round toward Minus Infinity (RMI)
+| -| *| /| Round toward zero (RTZ)
+ - * / Round to whatever is the current "dynamic/global" direction
The last row specifies operations that take whatever global rounding direction there is from their environment. If used within a function, such operations "inherit" the rounding direction of the caller. Will such a scheme enable an easy implementation of both the static (clause 4.1 of 754) and dynamic (clause 4.2 of 754) modes as well as interval work?
Finally, why stop at those four basic operations only? In many of the examples exchanged on the 1788 list we spoke about sqrt for example. Both sqrt and fma are required operations in 754. Shall we include them and make each row consist of 6 columns? Shall we also include other functions mentioned as recommended in clause 9 of 754? Definitely if we include everything I mentioned in this message we will not list the variants explicitly in 1788 but need to think of a way to present concisely.
I guess that I gave you all enough food for thought. I will appreciate any corrections to my ideas so that we all reach a much better 1788.
Thanks
--
Hossam A. H. Fahmy
Associate Professor
Electronics and Communications Department
Cairo University
Egypt