Re: interval arithmetic
Since a motivation of this work is making floating point addition
associative, i.e. bit-wise reproducible no matter what the order
of summation is, I'd like to point out some related work that
accomplishes this, to a precision chosen by the user (including
close to the usual bound). The algorithm does just one pass over
the data (in any order), uses only conventional IEEE 754 operations,
and uses only a limited number of registers, depending on the chosen
precision. Our base implementation can get an absolute error bound of
roughly
(*) n*macheps*2^(-28)*max_i |s_i| + 7*macheps*|S|
where s_1,...,s_n are the summands, and S is the exact sum, so up to about
2^(-28) times smaller than conventional summation, using just 6
double precision floating point registers, and costing about 7*n flops.
The slowdown compared to conventional (highly tuned) summation ranges from
3x (n=1M on 1 processor) to 1.2x (n=1M on p=1024 processors, each processor
getting 1024 summands). Increasing the number of registers from 6 to 2*K
(for K>3)
lowers the error bound to roughly
(**) n*2^(40*(1-K))*max_i |s_i| + 7*macheps*|S|
and raises the cost to roughly (3*K-2)*n flops. It is possible to imagine
additional specialized floating point operations to accelerate this; we have
spoken to one hardware vendor about this possibility (at their request).
For details, and a preliminary release of the ReproBLAS (Reproducible BLAS)
based on this approach, see
http://bebop.cs.berkeley.edu/reproblas/index.php
We are close to a new release, and also a more comprehensive description
of the algorithms, error bounds and software.
This email exchange raises the natural question of whether this (and
related)
algorithms could be modified to provide interval bounds. It seems
straightforward
to use the error bound (*) or (**) to create guaranteed and reproducible
upper and
lower bounds, and I expect that they could be tightened.
This is joint work with Diep Nguyen and Peter Ahrens. There is a growing
amount
of other related work (see our references), motivated by the fact that
even your
multicore laptop no longer guarantees reproducible results because of
nonassociative
floating point addition, combined with nondeterministic execution order.
In fact Intel recently released a new version of MKL with "CNR" =
Conditional
Numerical Reproducibility that guarantees reproducibility, provided that
the
user guarantees to use the same number of cores. This was (obviously)
motivated by
customer demand, a primary reason being debugging. MKL CNR works by
guaranteeing the order of execution, but this approach does not scale up
to large numbers of processors.
Jim Demmel
On 1/1/16 12:22 AM, Ulrich Kulisch wrote:
Dear friends,
I wish you all a happy and successful year 2016!
For my more recent mails to the group I did not get a siungle reply.
Nevertheless I attach a not yet published paper:
High Speed Associative Accumulation of Floating-point Numbers and
Floating-point Intervals
It may help one or the other to understand a secret for success of
interval arithmetic.
Best regards
Ulrich