Thread Links			Date Links
Thread Prev	Thread Next	Thread Index	Date Prev	Date Next	Date Index

Re: [IEEE P1788 er subgroup]: Baker thinks I can bring you to a focus...

To: STDS-1788@xxxxxxxxxxxxxxxxx
Subject: Re: [IEEE P1788 er subgroup]: Baker thinks I can bring you to a focus...
From: Ian McIntosh <ianm@xxxxxxxxxx>
Date: Thu, 12 Mar 2009 19:03:01 -0400
Delivered-to: mhonarc@xxxxxxxxxxxxxxxx
List-help: <http://listserv.ieee.org/cgi-bin/wa?LIST=STDS-1788>, <mailto:LISTSERV@LISTSERV.IEEE.ORG?body=INFO%20STDS-1788>
List-owner: <mailto:STDS-1788-request@LISTSERV.IEEE.ORG>
List-subscribe: <mailto:STDS-1788-subscribe-request@LISTSERV.IEEE.ORG>
List-unsubscribe: <mailto:STDS-1788-unsubscribe-request@LISTSERV.IEEE.ORG>
Sender: stds-1788@xxxxxxxx

Intel's 80 bit floating point implementation is a complication and at some point we needed to think about it, but not as bad as it sounds.

In traditional IA-32, 32 bit single and 64 bit double precision are widened to 80 bit double extended precision when they are loaded into registers, and narrowed with rounding when they are stored. Normally that works reasonably well. There are some cases where it doesn't:

1. First it can make it very difficult to get the same answers on IA-32 as on other architectures and to get the same answer on other architectures as on IA-32.

2. It can also make it difficult to get the same answer when some compilers optimize, or even when you change other nearby source slightly, and especially when you switch compiler versions or vendors. If the compiler runs out of registers it will "spill" one or more values to memory and reload it/them or access it/them from memory later when needed. There are several subproblems:
- The expressions spilled may vary depending on compiler options or the compiler version or other code in the same function.
- IF the compiler spills to double precision or single precision temporaries, spilling loses precision.
- The programmer can't predict which expressions will be spilled.
- The rounding done during spill will be whatever rounding mode is in effect at that point (that's the way the hardware works), not necessarily the rounding mode that should apply to that _expression_.

3. It can make it difficult to get the same answer when using SSE operations like ADDPD (generated by some compilers for the built in function "__m128d_mm_add_pd" and/or automatically). These instructions operate on non-widened values - 4 32 bit single precision or 2 64 bit double precision values in a vector register.

One solution is that the compiler should always use 80 bit spills (except for SSE), and BTW context switches always have to do that, but programmers have some control too:

a. If you explicitly cast a value to "long double", it must be handled as 80 bits. That generally would not affect performance, and it would eliminate rounding surprises.
b. If you explicitly cast it to "double" between operations, the _expression_ must be narrowed at those points. That generally would hurt performance noticeably, but it would both eliminate rounding surprises and give portable results matching other architectures (ignoring library differences, etc).
c. If you restrict yourself to newish CPUs with SSE and to newish compilers that support SSE, you can do most operations on pairs of 64 bit values as long as they use the same rounding mode. That would improve performance, and it would both eliminate rounding surprises and give portable results matching other architectures (ignoring library differences, etc).

So there are reasonable and practical solutions. The required complications would be inside the subroutines and class functions implementing Interval Arithmetic operations, not in the user code using them, so it is a one time cost and can be done by people knowing what to do and already using special functions to control the rounding mode. The standard should mention that implementers must take care of this on architectures where it matters, but that users would not be affected unless they chose to use implementation options to give faster but only approximately bounded answers.

Since those writing the implementation classes and modules can and must control this, it isn't necessary to have IA types built into languages. In languages like C, building IA types in would improve usability. In extensible languages like C++, classes can work fine.

I mentioned library differences. That's a whole other topic, but the solution is to have one carefully written and portable library, tailored where necessary or beneficial (eg, for SSE) to use the best code to give bit for bit reproducible results everywhere. Java did it (although in some cases they give the wrong answer everywhere), and IA can too.

Anyone who wants to can also produce faster versions that sacrifice accuracy for anybody who values speed over correctness, portability and reproducibility, or smarter versions giving tighter bounds for anyone who values that more than portability and reproducibility, but the default should be portability.

That is doable. Michel Hack and I worked on some parts of IBM's DFP library (with others especially Jim Shearer) which is written in C. It is essentially decimal digit for decimal digit portable between z/Architecture (evolved from the CISC System/360 mainframe architecture) and the Power6 (PowerPC architecture evolved from the POWER RISC architecture). There are differences like supporting Hex format on z/Architecture but not on PowerPC, some low level functions and code to access specific hardware instructions, some things done in software on one and microcode on the other, and similar achitecture tailoring, but otherwise it's the same source code.

- Ian Toronto IBM Lab 8200 Warden D2-445 905-413-3411

----- Forwarded by Ian McIntosh/Toronto/IBM on 12/03/2009 05:37 PM -----

Van Snyder <Van.Snyder@xxxxxxxxxxxx>
Sent by: owner-stds-1788-er@xxxxxxxxxxxxxxxxxxxxx

12/03/2009 02:35 PM

Please respond to
Van.Snyder@xxxxxxxxxxxx

To	Christian Keil <c.keil@xxxxxxxxxxxxx>
cc	"stds-1788-er@xxxxxxxxxxxxxxxxxxxxx" <stds-1788-er@xxxxxxxxxxxxxxxxxxxxx>
Subject	Re: [IEEE P1788 er subgroup]: Baker thinks I can bring you to a focus...

On Wed, 2009-03-11 at 20:12 -0700, Christian Keil wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Van Snyder wrote: > > Which brings up another thing that compiler writers will need to worry > > about: It's obvious that arithmetic operations on different boundaries > > of intervals have to round differently. What's not so obvious is that > > when wide registers are spilled into narrow memory, their contents have > > to be rounded differently depending upon which interval boundary they > > are. The standard should mention this. That this can happen at > > different stages of a computation, depending upon the optimization > > strategy in play when a program unit is compiled, is the origin of the > > impossibility (or at least extreme difficulty or expense) of achieving > > bit reproducibility. > That's indeed a scary and important thing. I think somewhere I read that > for example gcc (at least in older versions---I am missing the > reference) does not honor the current rounding mode when spilling > registers. So even if you would magically know that some register is > spilled, changing rounding mode wouldn't help you. > What does that imply for our standard? This looks like we have to > somehow attach an attribute to the FP value of a bound? I think it means that intervals have to be intrinsic types in programming languages. They can't be built up by C++ classes or Fortran 2003 modules, because doing so doesn't tell the compiler which FP number is a bound (and which bound) and which is just a point. One might suggest that an alternative is for languages to provide attributes for FP variables that say "always round up (or down)" when spilling from register to memory, but this doesn't tackle anonymous temporaries. > Maybe it's not > our task (neither subgroup nor maybe P1788) to figure out in detail, but > if we have an IA library that simply builds on FP arithmetic than this > might indeed prove difficult. How should the compiler know that some FP > value is a lower/upper bound? What happens on context switches when the > runtime system saves and restores the context? Are registers preserved > with their exact value or also spilled here? We should definitely put > this on our list! > > Christian > > - -- > /"\ > Christian Keil \ / ASCII Ribbon Campaign > mail:c.keil@xxxxxxxxxxxxx X against HTML email & vCards > / \----- Forwarded by Ian McIntosh/Toronto/IBM on 12/03/2009 05:37 PM -----

John Pryce <j.d.pryce@xxxxxxxxxxxx>
Sent by: owner-stds-1788-er@xxxxxxxxxxxxxxxxxxxxx

12/03/2009 06:17 AM

To	P1788-er <stds-1788-er@xxxxxxxxxxxxxxxxxxxxx>
cc
Subject	Re: [IEEE P1788 er subgroup]: Baker thinks I can bring you to a focus...

Hi folks On 12 Mar 2009, at 03:12, Christian Keil wrote: > Van Snyder wrote: >> Which brings up another thing that compiler writers will need to >> worry >> about: It's obvious that arithmetic operations on different >> boundaries >> of intervals have to round differently. What's not so obvious is >> that >> when wide registers are spilled into narrow memory, their contents >> have >> to be rounded differently depending upon which interval boundary they >> are. The standard should mention this. That this can happen at >> different stages of a computation, depending upon the optimization >> strategy in play when a program unit is compiled, is the origin of >> the >> impossibility (or at least extreme difficulty or expense) of >> achieving >> bit reproducibility. > That's indeed a scary and important thing. Scary indeed. Who was saying P1788 doesn't need to be concerned with level 4 (bit level and compiler)? Level 4 will screw up containment if we don't handle such issues. John
----- Forwarded by Ian McIntosh/Toronto/IBM on 12/03/2009 05:37 PM -----

Christian Keil <c.keil@xxxxxxxxxxxxx>
Sent by: owner-stds-1788-er@xxxxxxxxxxxxxxxxxxxxx

11/03/2009 11:12 PM

To
cc	"stds-1788-er@xxxxxxxxxxxxxxxxxxxxx" <stds-1788-er@xxxxxxxxxxxxxxxxxxxxx>
Subject	Re: [IEEE P1788 er subgroup]: Baker thinks I can bring you to a focus...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Van Snyder wrote: > Which brings up another thing that compiler writers will need to worry > about: It's obvious that arithmetic operations on different boundaries > of intervals have to round differently. What's not so obvious is that > when wide registers are spilled into narrow memory, their contents have > to be rounded differently depending upon which interval boundary they > are. The standard should mention this. That this can happen at > different stages of a computation, depending upon the optimization > strategy in play when a program unit is compiled, is the origin of the > impossibility (or at least extreme difficulty or expense) of achieving > bit reproducibility.
That's indeed a scary and important thing. I think somewhere I read that for example gcc (at least in older versions---I am missing the reference) does not honor the current rounding mode when spilling registers. So even if you would magically know that some register is spilled, changing rounding mode wouldn't help you. What does that imply for our standard? This looks like we have to somehow attach an attribute to the FP value of a bound? Maybe it's not our task (neither subgroup nor maybe P1788) to figure out in detail, but if we have an IA library that simply builds on FP arithmetic than this might indeed prove difficult. How should the compiler know that some FP value is a lower/upper bound? What happens on context switches when the runtime system saves and restores the context? Are registers preserved with their exact value or also spilled here? We should definitely put this on our list! Christian - -- /"\ Christian Keil \ / ASCII Ribbon Campaign mail:c.keil@xxxxxxxxxxxxx X against HTML email & vCards / \

Follow-Ups:
- Re: [IEEE P1788 er subgroup]: Baker thinks I can bring you to a focus...
  - From: R. Baker Kearfott

Prev by Date: Re: [IEEE P1788 er subgroup]: Baker thinks I can bring you to a focus...
Next by Date: Motion M0002: Review of why I think "Levels" a good idea
Previous by thread: Motion M0002: Review of why I think "Levels" a good idea
Next by thread: Re: [IEEE P1788 er subgroup]: Baker thinks I can bring you to a focus...
Index(es):
- Date
- Thread