Intel's 80 bit floating point implementation is a complication and at
some point we needed to think about it, but not as bad as it sounds.
In traditional IA-32, 32 bit single and 64 bit double precision are
widened to 80 bit double extended precision when they are loaded into
registers, and narrowed with rounding when they are stored. Normally
that works reasonably well. There are some cases where it doesn't:
1. First it can make it very difficult to get the same answers on IA-32
as on other architectures and to get the same answer on other
architectures as on IA-32.
2. It can also make it difficult to get the same answer when some
compilers optimize, or even when you change other nearby source
slightly, and especially when you switch compiler versions or vendors.
If the compiler runs out of registers it will "spill" one or more values
to memory and reload it/them or access it/them from memory later when
needed. There are several subproblems:
- The expressions spilled may vary depending on compiler options or the
compiler version or other code in the same function.
- IF the compiler spills to double precision or single precision
temporaries, spilling loses precision.
- The programmer can't predict which expressions will be spilled.
- The rounding done during spill will be whatever rounding mode is in
effect at that point (that's the way the hardware works), not
necessarily the rounding mode that should apply to that expression.
3. It can make it difficult to get the same answer when using SSE
operations like ADDPD (generated by some compilers for the built in
function "__m128d_mm_add_pd" and/or automatically). These instructions
operate on non-widened values - 4 32 bit single precision or 2 64 bit
double precision values in a vector register.
One solution is that the compiler should always use 80 bit spills
(except for SSE), and BTW context switches always have to do that, but
programmers have some control too:
a. If you explicitly cast a value to "long double", it must be handled
as 80 bits. That generally would not affect performance, and it would
eliminate rounding surprises.
b. If you explicitly cast it to "double" between operations, the
expression must be narrowed at those points. That generally would hurt
performance noticeably, but it would both eliminate rounding surprises
and give portable results matching other architectures (ignoring library
differences, etc).
c. If you restrict yourself to newish CPUs with SSE and to newish
compilers that support SSE, you can do most operations on pairs of 64
bit values as long as they use the same rounding mode. That would
improve performance, and it would both eliminate rounding surprises and
give portable results matching other architectures (ignoring library
differences, etc).
So there are reasonable and practical solutions. The required
complications would be inside the subroutines and class functions
implementing Interval Arithmetic operations, not in the user code using
them, so it is a one time cost and can be done by people knowing what to
do and already using special functions to control the rounding mode. The
standard should mention that implementers must take care of this on
architectures where it matters, but that users would not be affected
unless they chose to use implementation options to give faster but only
approximately bounded answers.
Since those writing the implementation classes and modules can and must
control this, it isn't necessary to have IA types built into languages.
In languages like C, building IA types in would improve usability. In
extensible languages like C++, classes can work fine.
I mentioned library differences. That's a whole other topic, but the
solution is to have one carefully written and portable library, tailored
where necessary or beneficial (eg, for SSE) to use the best code to give
bit for bit reproducible results everywhere. Java did it (although in
some cases they give the wrong answer everywhere), and IA can too.
Anyone who wants to can also produce faster versions that sacrifice
accuracy for anybody who values speed over correctness, portability and
reproducibility, or smarter versions giving tighter bounds for anyone
who values that more than portability and reproducibility, but the
default should be portability.
That is doable. Michel Hack and I worked on some parts of IBM's DFP
library (with others especially Jim Shearer) which is written in C. It
is essentially decimal digit for decimal digit portable between
z/Architecture (evolved from the CISC System/360 mainframe architecture)
and the Power6 (PowerPC architecture evolved from the POWER RISC
architecture). There are differences like supporting Hex format on
z/Architecture but not on PowerPC, some low level functions and code to
access specific hardware instructions, some things done in software on
one and microcode on the other, and similar achitecture tailoring, but
otherwise it's the same source code.
- Ian Toronto IBM Lab 8200 Warden D2-445 905-413-3411
----- Forwarded by Ian McIntosh/Toronto/IBM on 12/03/2009 05:37 PM -----
*Van Snyder <Van.Snyder@xxxxxxxxxxxx>*
Sent by: owner-stds-1788-er@xxxxxxxxxxxxxxxxxxxxx
12/03/2009 02:35 PM
Please respond to
Van.Snyder@xxxxxxxxxxxx
To
Christian Keil <c.keil@xxxxxxxxxxxxx>
cc
"stds-1788-er@xxxxxxxxxxxxxxxxxxxxx" <stds-1788-er@xxxxxxxxxxxxxxxxxxxxx>
Subject
Re: [IEEE P1788 er subgroup]: Baker thinks I can bring you to a focus...
On Wed, 2009-03-11 at 20:12 -0700, Christian Keil wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Van Snyder wrote:
> > Which brings up another thing that compiler writers will need to worry
> > about: It's obvious that arithmetic operations on different boundaries
> > of intervals have to round differently. What's not so obvious is that
> > when wide registers are spilled into narrow memory, their contents have
> > to be rounded differently depending upon which interval boundary they
> > are. The standard should mention this. That this can happen at
> > different stages of a computation, depending upon the optimization
> > strategy in play when a program unit is compiled, is the origin of the
> > impossibility (or at least extreme difficulty or expense) of achieving
> > bit reproducibility.
> That's indeed a scary and important thing. I think somewhere I read that
> for example gcc (at least in older versions---I am missing the
> reference) does not honor the current rounding mode when spilling
> registers. So even if you would magically know that some register is
> spilled, changing rounding mode wouldn't help you.
> What does that imply for our standard? This looks like we have to
> somehow attach an attribute to the FP value of a bound?
I think it means that intervals have to be intrinsic types in
programming languages. They can't be built up by C++ classes or Fortran
2003 modules, because doing so doesn't tell the compiler which FP number
is a bound (and which bound) and which is just a point. One might
suggest that an alternative is for languages to provide attributes for
FP variables that say "always round up (or down)" when spilling from
register to memory, but this doesn't tackle anonymous temporaries.
> Maybe it's not
> our task (neither subgroup nor maybe P1788) to figure out in detail, but
> if we have an IA library that simply builds on FP arithmetic than this
> might indeed prove difficult. How should the compiler know that some FP
> value is a lower/upper bound? What happens on context switches when the
> runtime system saves and restores the context? Are registers preserved
> with their exact value or also spilled here? We should definitely put
> this on our list!
>
> Christian
>
> - --
> /"\
> Christian Keil \ / ASCII Ribbon Campaign
> mail:c.keil@xxxxxxxxxxxxx X against HTML email & vCards
> / \
----- Forwarded by Ian McIntosh/Toronto/IBM on 12/03/2009 05:37 PM -----
*John Pryce <j.d.pryce@xxxxxxxxxxxx>*
Sent by: owner-stds-1788-er@xxxxxxxxxxxxxxxxxxxxx
12/03/2009 06:17 AM
To
P1788-er <stds-1788-er@xxxxxxxxxxxxxxxxxxxxx>
cc
Subject
Re: [IEEE P1788 er subgroup]: Baker thinks I can bring you to a focus...
Hi folks
On 12 Mar 2009, at 03:12, Christian Keil wrote:
> Van Snyder wrote:
>> Which brings up another thing that compiler writers will need to
>> worry
>> about: It's obvious that arithmetic operations on different
>> boundaries
>> of intervals have to round differently. What's not so obvious is
>> that
>> when wide registers are spilled into narrow memory, their contents
>> have
>> to be rounded differently depending upon which interval boundary they
>> are. The standard should mention this. That this can happen at
>> different stages of a computation, depending upon the optimization
>> strategy in play when a program unit is compiled, is the origin of
>> the
>> impossibility (or at least extreme difficulty or expense) of
>> achieving
>> bit reproducibility.
> That's indeed a scary and important thing.
Scary indeed. Who was saying P1788 doesn't need to be concerned with
level 4
(bit level and compiler)? Level 4 will screw up containment if we don't
handle such issues.
John
----- Forwarded by Ian McIntosh/Toronto/IBM on 12/03/2009 05:37 PM -----
*Christian Keil <c.keil@xxxxxxxxxxxxx>*
Sent by: owner-stds-1788-er@xxxxxxxxxxxxxxxxxxxxx
11/03/2009 11:12 PM
To
cc
"stds-1788-er@xxxxxxxxxxxxxxxxxxxxx" <stds-1788-er@xxxxxxxxxxxxxxxxxxxxx>
Subject
Re: [IEEE P1788 er subgroup]: Baker thinks I can bring you to a focus...
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Van Snyder wrote:
> Which brings up another thing that compiler writers will need to worry
> about: It's obvious that arithmetic operations on different boundaries
> of intervals have to round differently. What's not so obvious is that
> when wide registers are spilled into narrow memory, their contents have
> to be rounded differently depending upon which interval boundary they
> are. The standard should mention this. That this can happen at
> different stages of a computation, depending upon the optimization
> strategy in play when a program unit is compiled, is the origin of the
> impossibility (or at least extreme difficulty or expense) of achieving
> bit reproducibility.
That's indeed a scary and important thing. I think somewhere I read that
for example gcc (at least in older versions---I am missing the
reference) does not honor the current rounding mode when spilling
registers. So even if you would magically know that some register is
spilled, changing rounding mode wouldn't help you.
What does that imply for our standard? This looks like we have to
somehow attach an attribute to the FP value of a bound? Maybe it's not
our task (neither subgroup nor maybe P1788) to figure out in detail, but
if we have an IA library that simply builds on FP arithmetic than this
might indeed prove difficult. How should the compiler know that some FP
value is a lower/upper bound? What happens on context switches when the
runtime system saves and restores the context? Are registers preserved
with their exact value or also spilled here? We should definitely put
this on our list!
Christian
- --
/"\
Christian Keil \ / ASCII Ribbon Campaign
mail:c.keil@xxxxxxxxxxxxx X against HTML email & vCards
/ \