[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
Re: Meeting the Scope and Purpose of P754
Bob Davis wrote:
Begin below with your proposed solution.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
1.x.y ReproducibleResults Clause
Oh, and I forgot to state an obvious point: correctly rounded elementary
functions are not only a consumer of reproducibility, they are also (and
mainly) a producer of reproducibility.
So the ReproducibleResult clause should mention correct rounding for the
functions.
Now for a real-world use case:
Indeed the largest deployment of CRLibm code to date (in the LHC@Home
project) addressed an issue of reproducibility. And interestingly, it
was in distributed code. This project's goal was to distribute, on a
large number of untrusted machines, a simulation of the LHC for various
configurations of its superconducting magnets. As the machines were
untrusted, trust was ensured by redundancy: the same sub-computation was
sent to two machines, and the result was accepted if the two results did
match. The problem was (here comes reproducibility) that only
bit-for-bit match was acceptable: the phenomenon under simulation is
known chaotic, so one ulp difference may lead to arbitrarily different
results (that still make sense when averaged properly). This divergence
could not be caught by some "two results match up to some epsilon",
because, in a nutshell, the purpose of the simulation was to evaluate
this epsilon (to evaluate, for a given configuration of the magnets, how
many particles will be lost after many turns in the LHC). This is my
computer scientist understanding of this problem, physicists will find
it an overly simplified view, but I hope you get the point.
And to convert it to engineer hours: one CERN engineer spent IIRC three
months parallelizing the code using Boinc, then three more months
tracking last-bit differences, finally discovering that AMD and Intel
processors didn't have bit-for-bit compatible exp and log functions.
These are three months of my taxpayer's money that I hope the upcoming
standard will save.
And to convert it into performance: one easy solution was to limit the
distribution of the code to identical systems. The other easy solution
was to link to the slightly slower CRLibm, and use nearly twice as many
machines. Reproducibility did win performance in this particular case.
If only it had been out of the box it would also have won three month's
worth of computation.
(To be fair, I think Boinc was later extended to cater for redundancy
only on identical systems, but this was not an option at the time)
Regards,
Florent (supposedly on holliday)