[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Meeting the Scope and Purpose of P754



Bob Davis wrote:
Begin below with your proposed solution.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
1.x.y ReproducibleResults Clause

Oh, and I forgot to state an obvious point: correctly rounded elementary functions are not only a consumer of reproducibility, they are also (and mainly) a producer of reproducibility.

So the ReproducibleResult clause should mention correct rounding for the functions.


Now for a real-world use case:
Indeed the largest deployment of CRLibm code to date (in the LHC@Home project) addressed an issue of reproducibility. And interestingly, it was in distributed code. This project's goal was to distribute, on a large number of untrusted machines, a simulation of the LHC for various configurations of its superconducting magnets. As the machines were untrusted, trust was ensured by redundancy: the same sub-computation was sent to two machines, and the result was accepted if the two results did match. The problem was (here comes reproducibility) that only bit-for-bit match was acceptable: the phenomenon under simulation is known chaotic, so one ulp difference may lead to arbitrarily different results (that still make sense when averaged properly). This divergence could not be caught by some "two results match up to some epsilon", because, in a nutshell, the purpose of the simulation was to evaluate this epsilon (to evaluate, for a given configuration of the magnets, how many particles will be lost after many turns in the LHC). This is my computer scientist understanding of this problem, physicists will find it an overly simplified view, but I hope you get the point.

And to convert it to engineer hours: one CERN engineer spent IIRC three months parallelizing the code using Boinc, then three more months tracking last-bit differences, finally discovering that AMD and Intel processors didn't have bit-for-bit compatible exp and log functions. These are three months of my taxpayer's money that I hope the upcoming standard will save.

And to convert it into performance: one easy solution was to limit the distribution of the code to identical systems. The other easy solution was to link to the slightly slower CRLibm, and use nearly twice as many machines. Reproducibility did win performance in this particular case. If only it had been out of the box it would also have won three month's worth of computation.

(To be fair, I think Boinc was later extended to cater for redundancy only on identical systems, but this was not an option at the time)

Regards,

   Florent (supposedly on holliday)

754 | revision | FAQ | references | list archive