Thread Links			Date Links
Thread Prev	Thread Next	Thread Index	Date Prev	Date Next	Date Index

Re: Another relevant consideration Re: Motion P1788/M0009.01_ExactDotProduct

To: "James Demmel" <demmel@xxxxxxxxxxxxxxx>, <rbk@xxxxxxxxxxxx>
Subject: Re: Another relevant consideration Re: Motion P1788/M0009.01_ExactDotProduct
From: "Siegfried M. Rump" <rump@xxxxxxxxxxxxx>
Date: Mon, 2 Nov 2009 15:09:10 +0100
Cc: "James Demmel" <demmel@xxxxxxxxxxxxxxxxx>, <rbk@xxxxxxxxxxxxx>, "Arnold Neumaier" <Arnold.Neumaier@xxxxxxxxxxxx>, "Michael Schulte" <schulte@xxxxxxxxxxxxx>, <stds-1788@xxxxxxxx>, "Koushik Sen" <ksen@xxxxxxxxxxxxxxx>, "W. Kahan" <wkahan@xxxxxxxxxxxxxxxxx>
Delivered-to: mhonarc@xxxxxxxxxxxxxxxx
List-help: <http://listserv.ieee.org/cgi-bin/wa?LIST=STDS-1788>, <mailto:LISTSERV@LISTSERV.IEEE.ORG?body=INFO%20STDS-1788>
List-owner: <mailto:STDS-1788-request@LISTSERV.IEEE.ORG>
List-subscribe: <mailto:STDS-1788-subscribe-request@LISTSERV.IEEE.ORG>
List-unsubscribe: <mailto:STDS-1788-unsubscribe-request@LISTSERV.IEEE.ORG>
References: <C7104FC8.2877%George.Corliss@xxxxxxxxxxxxx> <4AEB012A.1020509@xxxxxxxxxxxx> <003901ca5977$1f06cc80$5d146580$@wisc.edu> <4AEB1591.7080902@xxxxxxxxxxxx> <4AEB227C.9080001@xxxxxxxxxxxxxxx> <4AEB2698.80305@xxxxxxxxxxxx> <4AECC084.3070400@xxxxxxxxxxxx> <op.u2qhdqlov059od@xxxxxxxxxxxxxxx> <4AEE05D8.2020905@xxxxxxxxxxxxxxx> <4AEEB56C.6080304@xxxxxxxxxxxx> <4AEEDFE5.4050900@xxxxxxxxxxxxxxx>
Sender: stds-1788@xxxxxxxx

We have had feedback from some users (eg Mathworks) that
determinism is critical to them.


Yes, usually results are NOT deterministic, only estimations are
available.

Mathworks themselve created problems with determinism. For a vector
x=randn(3,1) the statements x'*x and x(1)*x(1)+x(2)*x(2)+x(3)*x(3)
may yield different results because the first calls IMKL with intermediate
results stored in extended precision (on Laptops).

Cheers,    Siegfried

I'm not sure how to quantify this tradeoff, or whether it makes sense
to address the needs of both kinds of users (at the likely cost of a more
complicated standard).

In the case of summations or dot products, insisting on correct rounding,
or narrowest interval enclosures, obviously guarantees determinism. I'm
not sure how much faster a looser requirement of just , say, a few ulps
would be,
with or without specifying "determinism" in the standard.

Jim



Ralph Baker Kearfott wrote:

Jim,

Does the fact that virtually all machines are parallel now
imply that it is impossible for computations to be totally reproducible
anyway, or does it imply that reproducibility is still
achievable if we have sufficiently strict arithmetic
standards and they are implemented?

Baker

James Demmel wrote:

Independent (mostly) of the 4 levels of accuracy that Siegfried points
out below,
there is the question of whether the standard should require the
operation to be
deterministic, i.e. get the same answer whether it is run on different
machines,
or even twice on the same machine. Unless the answer is uniquely defined
("round

to nearest even", "round down" etc.) there is no reason animplementation

on a platform with different numbers of processors available at
different times,
even during a single run, would always use the same reduction tree to
compute
a sum or analogous operation, and so get the same answer.

We have both kinds of users of our linear algebra software, those for
whom speed
is more important than determinism (and so using the fastest reduction
tree is more
important than the same reduction tree), and those who prioritize
determinism
(to ease debugging, not confuse users, and sometimes because it is
required).
Since it is rapidly becoming impossible to buy a non-parallel machine
(because of
multicore, etc.), I wonder what people think the standard should say
about determinism?

Jim

Siegfried M. Rump wrote:

On Sat, 31 Oct 2009 23:56:04 +0100, Ralph Baker Kearfott
<rbk@xxxxxxxxxxxx> wrote:

P-1788,

I've been trying to decide how to vote on this issue, and
am wrestling with the following consideration: If we demand
the exact dot product as stated in the motion, we will obtain
reproducibility. However, this will be at the expense of having
an efficient software implementation. We have efficient software
implementations (e.g. Rump et al) of faithfully rounded dot products,
which
would be standard-conforming, and would to my knowledge satisfy
the needs of applications requiring an accurate dot product, if we
loosened the requirement, but I don't know if we can word the
standard to both allow these and for the results to be bit-wise
reproducible.

This is just some food for thought, or for refutation.

Baker

P-1788,

Arnold asked me to comment on the issue, so here are some thoughts.

Let a floating-point arithmetic with relative rounding error unit eps
be given, e.g. 2^-53 in double precision. For the result of a sum or
dot product there are basically 4 different "accuracy levels":

1 ordinary (recursive) evaluation, accuracy ~ eps*cond
2 eveluation in precision eps^2, accuracy ~ eps^2*cond
3 faithfully rounded result, accuracy ~ eps
4 rounded to nearest result, accuracy ~ 0.5*eps

It was asked, what do we need, and how much does it cost, very
reasonable questions.

First, what do we need. IMHO and for my scope of applications:

frequently level 1 is not enough.
In many cases level 2 is needed and it is sufficient.
Level 3 would be nice to have because it closes the gap between
precision and accuracy.
In special cases level 4 may be nice.

The difference between 3 and 4 reminds me on faithful and to nearest
conversion. For example,

f =
.93132257461547861902257656912845935892608650874535669572651386260986328126e-9


would likely yield the result 2^(-30), and scanning all except the last
mantissa digit "6" this is correctly rounded to nearest. But with the
last
digit it is not correct. In both cases, 2^(-30) would be a correct
faithful
rounding.

A big advantage of rounding to nearest is that the result is uniquely
defined, whereas for faithful rounding, in general, two results meet
the specification.

From a numerical point of view, the computing time of an algorithm
producing a faithfully rounded result is proportional to the logarithm
of the condition number, in my opinion a nice property.

For rounding to nearest the computing time also depends on the nearness
of the result to the midpoint of adjacent floating-point numbers (call
that switching point), strange from a numerical point of view.

Second, how much does it cost.
I designed my algorithms to be fast in a high level language like C or
Fortran, and therefore branches or access to mantissa or exponents are
avoided. For a hardware implementation this changes completely.

For summation fast algorithms are available, and in general there isnottoo much difference in computing time for levels 2 or 3 or 4. Ofcourse,

if the result is very near a switching point, then time for level 4
increases;
but those cases may be rare.

For dot product the picture changes. The new difficulties come from
more or less constructed examples, namely intermediate under- or
overflow.

Here fast algorithms for level 2 are available, and for manyapplications

sufficient.
For levels 3 or 4 care is necessary for intermediate under/overflow.
One possibility is scaling. Since I expect more or less constructed
examples with intermediate under/overflow, the time penalty may in
general be small.

With some care, the algorithms already discussed can be adaptated from
faithful rounding to rounding to nearest, so that basically a time
penalty applies only if exact the result is near a switching point.

If one does not like to guess how often intermediate under/overflow or
nearness
to a switching point occurs, then Malcolm's algorithm or a long
accumulator
advocated by Kulisch may be used.

Although few people may find it necessary to have _always_ a rounded to
nearest result, a clear advantage is that the result is unique. From a
numerical point of view, even level 2 would be a big step and often
sufficient.

Best wishes

Siegfried M. Rump

--=====================================================
Prof. Dr. Siegfried M. Rump
Institute for Reliable Computing
Hamburg University of Technology
Schwarzenbergstr. 95
21071 Hamburg
Germany
phone +49 40 42878 3027
fax +49 40 42878 2489
http://www.ti3.tu-harburg.de

and

Visiting Professor at Waseda University
Faculty of Science and Engineering
Shinjuku Lambdax Bldg. 902
2-4-12 Okubo, Shinjuku-ku
Tokyo 169-0072
Japan
phone/fax in Japan +81 3 5286 3414

References:
- Re: Motion P1788/M0009.01_ExactDotProduct
  - From: Corliss, George
- Re: Motion P1788/M0009.01_ExactDotProduct
  - From: Ralph Baker Kearfott
- RE: Motion P1788/M0009.01_ExactDotProduct
  - From: Michael Schulte
- Re: Motion P1788/M0009.01_ExactDotProduct
  - From: Ralph Baker Kearfott
- Re: Motion P1788/M0009.01_ExactDotProduct
  - From: James Demmel
- Re: Motion P1788/M0009.01_ExactDotProduct
  - From: Arnold Neumaier
- Another relevant consideration Re: Motion P1788/M0009.01_ExactDotProduct
  - From: Ralph Baker Kearfott
- Re: Another relevant consideration Re: Motion P1788/M0009.01_ExactDotProduct
  - From: Siegfried M. Rump
- Re: Another relevant consideration Re: Motion P1788/M0009.01_ExactDotProduct
  - From: James Demmel
- Re: Another relevant consideration Re: Motion P1788/M0009.01_ExactDotProduct
  - From: Ralph Baker Kearfott
- Re: Another relevant consideration Re: Motion P1788/M0009.01_ExactDotProduct
  - From: James Demmel

Prev by Date: Motion P1788/M0009.01_ExactDotProduct YES
Next by Date: Re: Another relevant consideration Re: Motion P1788/M0009.01_ExactDotProduct
Previous by thread: Re: Another relevant consideration Re: Motion P1788/M0009.01_ExactDotProduct
Next by thread: Re: Motion P1788/M0009.01_ExactDotProduct: voting period officially begins
Index(es):
- Date
- Thread