Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: [IEEE P1788 er subgroup]: Baker thinks I can bring you to a focus...




Intel's 80 bit floating point implementation is a complication and at some point we needed to think about it, but not as bad as it sounds.

In traditional IA-32, 32 bit single and 64 bit double precision are widened to 80 bit double extended precision when they are loaded into registers, and narrowed with rounding when they are stored.  Normally that works reasonably well.  There are some cases where it doesn't:

1. First it can make it very difficult to get the same answers on IA-32 as on other architectures and to get the same answer on other architectures as on IA-32.

2. It can also make it difficult to get the same answer when some compilers optimize, or even when you change other nearby source slightly, and especially when you switch compiler versions or vendors.  If the compiler runs out of registers it will "spill" one or more values to memory and reload it/them or access it/them from memory later when needed.  There are several subproblems:
     - The expressions spilled may vary depending on compiler options or the compiler version or other code in the same function.
     -  IF the compiler spills to double precision or single precision temporaries, spilling loses precision.
     - The programmer can't predict which expressions will be spilled.
     - The rounding done during spill will be whatever rounding mode is in effect at that point (that's the way the hardware works), not necessarily the rounding mode that should apply to that _expression_.

3. It can make it difficult to get the same answer when using SSE operations like ADDPD (generated by some compilers for the built in function "__m128d_mm_add_pd" and/or automatically).  These instructions operate on non-widened values - 4 32 bit single precision or 2 64 bit double precision values in a vector register.


One solution is that the compiler should always use 80 bit spills (except for SSE), and BTW context switches always have to do that, but programmers have some control too:

a. If you explicitly cast a value to "long double", it must be handled as 80 bits.  That generally would not affect performance, and it would eliminate rounding surprises.
b.  If you explicitly cast it to "double" between operations, the _expression_ must be narrowed at those points.  That generally would hurt performance noticeably, but it would both eliminate rounding surprises and give portable results matching other architectures (ignoring library differences, etc).
c.  If you restrict yourself to newish CPUs with SSE and to newish compilers that support SSE, you can do most operations on pairs of 64 bit values as long as they use the same rounding mode.  That would improve performance, and it would both eliminate rounding surprises and give portable results matching other architectures (ignoring library differences, etc).

So there are reasonable and practical solutions.  The required complications would be inside the subroutines and class functions implementing Interval Arithmetic operations, not in the user code using them, so it is a one time cost and can be done by people knowing what to do and already using special functions to control the rounding mode.  The standard should mention that implementers must take care of this on architectures where it matters, but that users would not be affected unless they chose to use implementation options to give faster but only approximately bounded answers.

Since those writing the implementation classes and modules can and must control this, it isn't necessary to have IA types built into languages.  In languages like C, building IA types in would improve usability.  In extensible languages like C++, classes can work fine.


I mentioned library differences.  That's a whole other topic, but the solution is to have one carefully written and portable library, tailored where necessary or beneficial (eg, for SSE) to use the best code to give bit for bit reproducible results everywhere.  Java did it (although in some cases they give the wrong answer everywhere), and IA can too.

Anyone who wants to can also produce faster versions that sacrifice accuracy for anybody who values speed over correctness, portability and reproducibility, or smarter versions giving tighter bounds for anyone who values that more than portability and reproducibility, but the default should be portability.

That is doable.  Michel Hack and I worked on some parts of IBM's DFP library (with others especially Jim Shearer) which is written in C.  It is essentially decimal digit for decimal digit portable between z/Architecture (evolved from the CISC System/360 mainframe architecture) and the Power6 (PowerPC architecture evolved from the POWER RISC architecture).  There are differences like supporting Hex format on z/Architecture but not on PowerPC, some low level functions and code to access specific hardware instructions, some things done in software on one and microcode on the other, and similar achitecture tailoring, but otherwise it's the same source code.

- Ian          Toronto IBM Lab   8200 Warden   D2-445   905-413-3411

----- Forwarded by Ian McIntosh/Toronto/IBM on 12/03/2009 05:37 PM -----
Van Snyder <Van.Snyder@xxxxxxxxxxxx>
Sent by: owner-stds-1788-er@xxxxxxxxxxxxxxxxxxxxx

12/03/2009 02:35 PM
Please respond to
Van.Snyder@xxxxxxxxxxxx

To
Christian Keil <c.keil@xxxxxxxxxxxxx>
cc
"stds-1788-er@xxxxxxxxxxxxxxxxxxxxx" <stds-1788-er@xxxxxxxxxxxxxxxxxxxxx>
Subject
Re: [IEEE P1788 er subgroup]: Baker thinks I can bring you to a        focus...






On Wed, 2009-03-11 at 20:12 -0700, Christian Keil wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Van Snyder wrote:
> > Which brings up another thing that compiler writers will need to worry
> > about:  It's obvious that arithmetic operations on different boundaries
> > of intervals have to round differently.  What's not so obvious is that
> > when wide registers are spilled into narrow memory, their contents have
> > to be rounded differently depending upon which interval boundary they
> > are.  The standard should mention this.  That this can happen at
> > different stages of a computation, depending upon the optimization
> > strategy in play when a program unit is compiled, is the origin of the
> > impossibility (or at least extreme difficulty or expense) of achieving
> > bit reproducibility.
> That's indeed a scary and important thing. I think somewhere I read that
> for example gcc (at least in older versions---I am missing the
> reference) does not honor the current rounding mode when spilling
> registers. So even if you would magically know that some register is
> spilled, changing rounding mode wouldn't help you.
> What does that imply for our standard? This looks like we have to
> somehow attach an attribute to the FP value of a bound?

I think it means that intervals have to be intrinsic types in
programming languages.  They can't be built up by C++ classes or Fortran
2003 modules, because doing so doesn't tell the compiler which FP number
is a bound (and which bound) and which is just a point.  One might
suggest that an alternative is for languages to provide attributes for
FP variables that say "always round up (or down)" when spilling from
register to memory, but this doesn't tackle anonymous temporaries.

>  Maybe it's not
> our task (neither subgroup nor maybe P1788) to figure out in detail, but
> if we have an IA library that simply builds on FP arithmetic than this
> might indeed prove difficult. How should the compiler know that some FP
> value is a lower/upper bound? What happens on context switches when the
> runtime system saves and restores the context? Are registers preserved
> with their exact value or also spilled here? We should definitely put
> this on our list!
>
> Christian
>
> - --
>                              /"\
> Christian Keil               \ /    ASCII Ribbon Campaign
> mail:c.keil@xxxxxxxxxxxxx     X  against HTML email & vCards
>                              / \


----- Forwarded by Ian McIntosh/Toronto/IBM on 12/03/2009 05:37 PM -----
John Pryce <j.d.pryce@xxxxxxxxxxxx>
Sent by: owner-stds-1788-er@xxxxxxxxxxxxxxxxxxxxx

12/03/2009 06:17 AM

To
P1788-er <stds-1788-er@xxxxxxxxxxxxxxxxxxxxx>
cc
Subject
Re: [IEEE P1788 er subgroup]: Baker thinks I can bring you to a        focus...





Hi folks

On 12 Mar 2009, at 03:12, Christian Keil wrote:
> Van Snyder wrote:
>> Which brings up another thing that compiler writers will need to  
>> worry
>> about:  It's obvious that arithmetic operations on different  
>> boundaries
>> of intervals have to round differently.  What's not so obvious is  
>> that
>> when wide registers are spilled into narrow memory, their contents  
>> have
>> to be rounded differently depending upon which interval boundary they
>> are.  The standard should mention this.  That this can happen at
>> different stages of a computation, depending upon the optimization
>> strategy in play when a program unit is compiled, is the origin of  
>> the
>> impossibility (or at least extreme difficulty or expense) of  
>> achieving
>> bit reproducibility.
> That's indeed a scary and important thing.

Scary indeed. Who was saying P1788 doesn't need to be concerned with  
level 4
(bit level and compiler)? Level 4 will screw up containment if we don't
handle such issues.

John

----- Forwarded by Ian McIntosh/Toronto/IBM on 12/03/2009 05:37 PM -----
Christian Keil <c.keil@xxxxxxxxxxxxx>
Sent by: owner-stds-1788-er@xxxxxxxxxxxxxxxxxxxxx

11/03/2009 11:12 PM

To
cc
"stds-1788-er@xxxxxxxxxxxxxxxxxxxxx" <stds-1788-er@xxxxxxxxxxxxxxxxxxxxx>
Subject
Re: [IEEE P1788 er subgroup]: Baker thinks I can bring you to a        focus...





-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Van Snyder wrote:
> Which brings up another thing that compiler writers will need to worry
> about:  It's obvious that arithmetic operations on different boundaries
> of intervals have to round differently.  What's not so obvious is that
> when wide registers are spilled into narrow memory, their contents have
> to be rounded differently depending upon which interval boundary they
> are.  The standard should mention this.  That this can happen at
> different stages of a computation, depending upon the optimization
> strategy in play when a program unit is compiled, is the origin of the
> impossibility (or at least extreme difficulty or expense) of achieving
> bit reproducibility.

That's indeed a scary and important thing. I think somewhere I read that
for example gcc (at least in older versions---I am missing the
reference) does not honor the current rounding mode when spilling
registers. So even if you would magically know that some register is
spilled, changing rounding mode wouldn't help you.
What does that imply for our standard? This looks like we have to
somehow attach an attribute to the FP value of a bound? Maybe it's not
our task (neither subgroup nor maybe P1788) to figure out in detail, but
if we have an IA library that simply builds on FP arithmetic than this
might indeed prove difficult. How should the compiler know that some FP
value is a lower/upper bound? What happens on context switches when the
runtime system saves and restores the context? Are registers preserved
with their exact value or also spilled here? We should definitely put
this on our list!

Christian

- --
                            /"\
Christian Keil               \ /    ASCII Ribbon Campaign
mail:c.keil@xxxxxxxxxxxxx     X  against HTML email & vCards
                            / \