Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: interval arithmetic



Since a motivation of this work is making floating point addition
associative, i.e. bit-wise reproducible no matter what the order
of summation is, I'd like to point out some related work that
accomplishes this, to a precision chosen by the user (including
close to the usual bound). The algorithm does just one pass over
the data (in any order), uses only conventional IEEE 754 operations,
and uses only a limited number of registers, depending on the chosen
precision. Our base implementation can get an absolute error bound of roughly
(*)   n*macheps*2^(-28)*max_i |s_i| + 7*macheps*|S|
where s_1,...,s_n are the summands, and S is the exact sum, so up to about
2^(-28) times smaller than conventional summation, using just 6
double precision floating point registers, and costing about 7*n flops.
The slowdown compared to conventional (highly tuned) summation ranges from
3x (n=1M on 1 processor)  to 1.2x (n=1M on p=1024 processors, each processor
getting 1024 summands). Increasing the number of registers from 6 to 2*K (for K>3)
lowers the error bound to roughly
(**)      n*2^(40*(1-K))*max_i |s_i| + 7*macheps*|S|
and raises the cost to roughly (3*K-2)*n flops. It is possible to imagine
additional specialized floating point operations to accelerate this; we have
spoken to one hardware vendor about this possibility (at their request).

For details, and a preliminary release of the ReproBLAS (Reproducible BLAS)
based on this approach, see
    http://bebop.cs.berkeley.edu/reproblas/index.php
We are close to a new release, and also a more comprehensive description
of the algorithms, error bounds and software.

This email exchange raises the natural question of whether this (and related) algorithms could be modified to provide interval bounds. It seems straightforward to use the error bound (*) or (**) to create guaranteed and reproducible upper and
lower bounds, and I expect that they could be tightened.

This is joint work with Diep Nguyen and Peter Ahrens. There is a growing amount of other related work (see our references), motivated by the fact that even your multicore laptop no longer guarantees reproducible results because of nonassociative
floating point addition, combined with nondeterministic execution order.

In fact Intel recently released a new version of MKL with "CNR" = Conditional Numerical Reproducibility that guarantees reproducibility, provided that the user guarantees to use the same number of cores. This was (obviously) motivated by
customer demand, a primary reason being debugging. MKL CNR works by
guaranteeing the order of execution, but this approach does not scale up
to large numbers of processors.

Jim Demmel



On 1/1/16 12:22 AM, Ulrich Kulisch wrote:
Dear friends,

I wish you all a happy and successful year 2016!

For my more recent mails to the group I did not get a siungle reply. Nevertheless I attach a not yet published paper:

High Speed Associative Accumulation of Floating-point Numbers and Floating-point Intervals

It may help one or the other to understand a secret for success of interval arithmetic.

Best regards
Ulrich