[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
Re: GRE: rogue comment: IEEE 754R and Reproducibility
I'd like to address the importance of reproducibility.
These comments are based on an on-going discussion with lots of
other people at companies and academia about what changes multicore will
bring.
The emerging common vision is that there will be two layers at which
people will program:
The "efficiency layer" exposes all the details of the architecture to
allow
maximizing performance. This will be programmed by a small fraction
of expert programmers, some of whom will in turn produce the next layer:
The "productivity layer" hides as many details about parallelism and the
architecture as possible, and is intended to be easily programmable
by the vast majority of programmers. These are the programmers
companies like Intel, Microsoft etc have to support, or else go out
of business.
Given this dichotomy, I think there are several different situations where
the importance and difficulty of reproducibility varies:
reproducibility across different platforms (hard) versus
when you type a.out twice in a row on the same
platform (easier)
reproducibility in general parallel code written at the "efficiency layer"
with asynchronous updates to global variables like sums (hard)
versus
using only "productivity level" building blocks constructed
appropriately (easier)
reproducibility for experts who think they know what they're doing
regarding error analysis (less important) versus the large
majority of programmers trying to debug codes written at the
productivity level (more important)
reproducibilty when the code is "in principle" deterministic (more
important), as
opposed to when it is built with and advertised as having
nondeterministic
components using techniques like randomization (less important).
My multicore collaborators think that, independent of floating point,
in order to support the vast majority of programmers at the
productivity level, the system
(1) must support debugging at the productivity layer, i.e. give the
same results
twice in a row from a.out, if necessary with a performance hit,
modulo the next point:
(2) should not add nondeterminism to an otherwise deterministically
constructed program at the "productivity layer", or at least require
an explicit setting of an "allow nondeterminism if it speeds up
up a lot"
switch
My question about IEEE 754R is whether there are any features that
make this (modest) level of reproducibility hard to achieve? Clearly
care must be taken to make sure operations like sum-reductions
can occur in a deterministic order if desired.
We have also been asking if there are any legal requirements on
reproducibility (of the a.out-twice-in-a-row kind) that might arise
from FDA rules on medical data processing, for example, but have
not unearthed any yet.
I welcome comments on this.
Jim Demmel
Nick Maclaren wrote:
[ Please do not copy me on replies. ]
I am sorry, but the objective and paper remain, as they always were,
so unrealistic as to justify some impoliteness. It is possible to
argue that reproducibility is a desirable objective, yes, but it is
fallacious to imply that it is entirely beneficial. But let's skip
that aspect.
It is theoretically possible to support reproducibility in combination
with most forms of (serial) optimisation, though IEEE 754R does not
support the necessary approach and it would have to be done as another
layer on top. The key is that a datum must be a value+flags, so that
global flags (with their implied ordering) are entirely abolished.
It still sacrifices SOME optimisations, but let that pass.
And, despite what you say, reproducible reductions are not hard to
specify for any simple, side-effect-free elemental operation. For
example, just require them to be done in extra precision and to
return an error bound; if that leads to ambiguity, repeat in serial
mode or iterate up precisions. Whether it's worth the effort is
less clear.
But it is PROVABLY IMPOSSIBLE to achieve in a general parallel code,
and attempting a target that has a mathematical proof of absolute
impossibility is Not Clever. If you want to do that (and it is a
perfectly reasonable objective), you MUST face up to the issue and
specify some way out.
As I pointed out, consider a program with a fixed number of threads
(thus evading your dynamic exclusion), all loads and stores atomic
(to avoid race conditions and subsequent undefined behaviour).
Requiring a reproducible result needs more than Lamport sequential
consistency (in itself a problem), because every implementation
must meet the SAME sequential ordering. You are therefore demanding
a global, absolute, precise clock model - and nobody in the parallel
field believes that such a thing is even theoretically possible, in
combination with even tolerable efficiency OR scalability.
[ In this context, "global" means that the same clock is accessible
to all threads, "absolute" means that all threads get the same value
and "precise" means that any two events have different timestamps. ]
Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email: nmm1@xxxxxxxxx
Tel.: +44 1223 334761 Fax: +44 1223 334679