[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: reproducible format



You're right in the presence of communicating threads, but they are not reproducible anyway, even in integer, and so this is not floating point's problem to solve. Absent threading, the hardware of my acquaintance goes to considerable effort to provide the illusion of in-order execution even with an out-of-order reality, and some vendors (such as my own company) are in-order to begin with.

Ivan

Nick Maclaren wrote:
Ivan Godard <igodard@xxxxxxxxxxx> wrote:
  
All hardware is deterministic, so if you need to reproduce results then 
all you have to do is rerun the same data on the same hardware.  ...
    

That is not true, is getting a lot less true, and is the reason that
the pursuit of reproducibility is attempting to solve yesterday's
problems tomorrow.

The killer here is parallelism, and almost all modern hardware is
increasingly parallel, even within the floating-point units.  To
get reproducible parallel execution, you almost always need a
unique order, which means either a consistent, precise global
clock or to abandon many of the advantages of parallelism for
performance by requiring the semantics to be "as if executed purely
serially".

Global clocks are Bad News, and all major chip manufacturers are
trying to get away from them.  I believe that 50% of the power in
a current mainstream chip is needed to drive the clock, and the
proportion is increasing.

But the problems get worse.  As soon as any sub-units can handle
unpredictable exceptions (e.g. parity/checksum/consistency errors)
on their own (usually by retrying), the requirement for a defined
order means that the whole SYSTEM has to block while that is
sorted out.  And that is catastrophic.

Note that NONE of that is an academic diversion, because I am
describing problems that are really hurting the industry TODAY,
and I can witness occur on real systems.  For example, take a
look at the hardware journals etc. for the work on clocking.

I agree that it is technically possible to require reproducibility
for IEEE 754R, because forcing serialisation of the basic floating
point operations is not a big deal.  But, while it makes life a
lot easier for some people and provides real benefits for a few,
TODAY, they are going to be forced to face up to the fact that it
is an unattainable goal TOMORROW.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  nmm1@xxxxxxxxx
Tel.:  +44 1223 334761    Fax:  +44 1223 334679

  

754 | revision | FAQ | references | list archive