Thread Links			Date Links
Thread Prev	Thread Next	Thread Index	Date Prev	Date Next	Date Index

Re: Another relevant consideration Re: Motion P1788/M0009.01_ExactDotProduct

To: James Demmel <demmel@xxxxxxxxxxxxxxx>
Subject: Re: Another relevant consideration Re: Motion P1788/M0009.01_ExactDotProduct
From: Ralph Baker Kearfott <rbk@xxxxxxxxxxxx>
Date: Mon, 02 Nov 2009 04:33:16 -0600
Cc: "Siegfried M. Rump" <rump@xxxxxxxxxxxxx>, rbk@xxxxxxxxxxxxx, Arnold Neumaier <Arnold.Neumaier@xxxxxxxxxxxx>, James Demmel <demmel@xxxxxxxxxxxxxxxxx>, Michael Schulte <schulte@xxxxxxxxxxxxx>, "'stds-1788@xxxxxxxx'" <stds-1788@xxxxxxxx>
Delivered-to: mhonarc@xxxxxxxxxxxxxxxx
In-reply-to: <4AEE05D8.2020905@xxxxxxxxxxxxxxx>
List-help: <http://listserv.ieee.org/cgi-bin/wa?LIST=STDS-1788>, <mailto:LISTSERV@LISTSERV.IEEE.ORG?body=INFO%20STDS-1788>
List-owner: <mailto:STDS-1788-request@LISTSERV.IEEE.ORG>
List-subscribe: <mailto:STDS-1788-subscribe-request@LISTSERV.IEEE.ORG>
List-unsubscribe: <mailto:STDS-1788-unsubscribe-request@LISTSERV.IEEE.ORG>
Organization: University of Louisiana at Lafayette
References: <C7104FC8.2877%George.Corliss@xxxxxxxxxxxxx> <4AEB012A.1020509@xxxxxxxxxxxx> <003901ca5977$1f06cc80$5d146580$@wisc.edu> <4AEB1591.7080902@xxxxxxxxxxxx> <4AEB227C.9080001@xxxxxxxxxxxxxxx> <4AEB2698.80305@xxxxxxxxxxxx> <4AECC084.3070400@xxxxxxxxxxxx> <op.u2qhdqlov059od@xxxxxxxxxxxxxxx> <4AEE05D8.2020905@xxxxxxxxxxxxxxx>
Reply-to: rbk@xxxxxxxxxxxx
Sender: stds-1788@xxxxxxxx
User-agent: Thunderbird 2.0.0.23 (Windows/20090812)

Jim,

Does the fact that virtually all machines are parallel now
imply that it is impossible for computations to be totally reproducible
anyway, or does it imply that reproducibility is still
achievable if we have sufficiently strict arithmetic
standards and they are implemented?

Baker

James Demmel wrote:
> Independent (mostly) of the 4 levels of accuracy that Siegfried points
> out below,
> there is the question of whether the standard should require the
> operation to be
> deterministic, i.e. get the same answer whether it is run on different
> machines,
> or even twice on the same machine. Unless the answer is uniquely defined
> ("round
> to nearest even", "round down" etc.) there is no reason an implementation
> on a platform with different numbers of processors available at
> different times,
> even during a single run, would always use the same reduction tree to
> compute
> a sum or analogous operation, and so get the same answer.
> 
> We have both kinds of users of our linear algebra software, those for
> whom speed
> is more important than determinism (and so using the fastest reduction
> tree is more
> important than the same reduction tree), and those who prioritize
> determinism
> (to ease debugging, not confuse users, and sometimes because it is
> required).
> Since it is rapidly becoming impossible to buy a non-parallel machine
> (because of
> multicore, etc.), I wonder what people think the standard should say
> about determinism?
> 
> Jim
> 
> Siegfried M. Rump wrote:
>> On Sat, 31 Oct 2009 23:56:04 +0100, Ralph Baker Kearfott
>> <rbk@xxxxxxxxxxxx> wrote:
>>
>>> P-1788,
>>>
>>> I've been trying to decide how to vote on this issue, and
>>> am wrestling with the following consideration: If we demand
>>> the exact dot product as stated in the motion, we will obtain
>>> reproducibility. However, this will be at the expense of having
>>> an efficient software implementation. We have efficient software
>>> implementations (e.g. Rump et al) of faithfully rounded dot products,
>>> which
>>> would be standard-conforming, and would to my knowledge satisfy
>>> the needs of applications requiring an accurate dot product, if we
>>> loosened the requirement, but I don't know if we can word the
>>> standard to both allow these and for the results to be bit-wise
>>> reproducible.
>>>
>>> This is just some food for thought, or for refutation.
>>>
>>> Baker
>>>
>> P-1788,
>>
>> Arnold asked me to comment on the issue, so here are some thoughts.
>>
>> Let a floating-point arithmetic with relative rounding error unit eps
>> be given, e.g. 2^-53 in double precision. For the result of a sum or
>> dot product there are basically 4 different "accuracy levels":
>>
>> 1 ordinary (recursive) evaluation, accuracy ~ eps*cond
>> 2 eveluation in precision eps^2, accuracy ~ eps^2*cond
>> 3 faithfully rounded result, accuracy ~ eps
>> 4 rounded to nearest result, accuracy ~ 0.5*eps
>>
>> It was asked, what do we need, and how much does it cost, very
>> reasonable questions.
>>
>> First, what do we need. IMHO and for my scope of applications:
>>
>> frequently level 1 is not enough.
>> In many cases level 2 is needed and it is sufficient.
>> Level 3 would be nice to have because it closes the gap between
>> precision and accuracy.
>> In special cases level 4 may be nice.
>>
>> The difference between 3 and 4 reminds me on faithful and to nearest
>> conversion. For example,
>>
>> f =
>> .93132257461547861902257656912845935892608650874535669572651386260986328126e-9
>>
>>
>> would likely yield the result 2^(-30), and scanning all except the last
>> mantissa digit "6" this is correctly rounded to nearest. But with the
>> last
>> digit it is not correct. In both cases, 2^(-30) would be a correct
>> faithful
>> rounding.
>>
>> A big advantage of rounding to nearest is that the result is uniquely
>> defined, whereas for faithful rounding, in general, two results meet
>> the specification.
>>
>> From a numerical point of view, the computing time of an algorithm
>> producing a faithfully rounded result is proportional to the logarithm
>> of the condition number, in my opinion a nice property.
>>
>> For rounding to nearest the computing time also depends on the nearness
>> of the result to the midpoint of adjacent floating-point numbers (call
>> that switching point), strange from a numerical point of view.
>>
>> Second, how much does it cost.
>> I designed my algorithms to be fast in a high level language like C or
>> Fortran, and therefore branches or access to mantissa or exponents are
>> avoided. For a hardware implementation this changes completely.
>>
>> For summation fast algorithms are available, and in general there is not
>> too much difference in computing time for levels 2 or 3 or 4. Of course,
>> if the result is very near a switching point, then time for level 4
>> increases;
>> but those cases may be rare.
>>
>> For dot product the picture changes. The new difficulties come from
>> more or less constructed examples, namely intermediate under- or
>> overflow.
>> Here fast algorithms for level 2 are available, and for many applications
>> sufficient.
>> For levels 3 or 4 care is necessary for intermediate under/overflow.
>> One possibility is scaling. Since I expect more or less constructed
>> examples with intermediate under/overflow, the time penalty may in
>> general be small.
>>
>> With some care, the algorithms already discussed can be adaptated from
>> faithful rounding to rounding to nearest, so that basically a time
>> penalty applies only if exact the result is near a switching point.
>>
>> If one does not like to guess how often intermediate under/overflow or
>> nearness
>> to a switching point occurs, then Malcolm's algorithm or a long
>> accumulator
>> advocated by Kulisch may be used.
>>
>> Although few people may find it necessary to have _always_ a rounded to
>> nearest result, a clear advantage is that the result is unique. From a
>> numerical point of view, even level 2 would be a big step and often
>> sufficient.
>>
>> Best wishes
>>
>> Siegfried M. Rump
>>
>> --=====================================================
>> Prof. Dr. Siegfried M. Rump
>> Institute for Reliable Computing
>> Hamburg University of Technology
>> Schwarzenbergstr. 95
>> 21071 Hamburg
>> Germany
>> phone +49 40 42878 3027
>> fax +49 40 42878 2489
>> http://www.ti3.tu-harburg.de
>>
>> and
>>
>> Visiting Professor at Waseda University
>> Faculty of Science and Engineering
>> Shinjuku Lambdax Bldg. 902
>> 2-4-12 Okubo, Shinjuku-ku
>> Tokyo 169-0072
>> Japan
>> phone/fax in Japan +81 3 5286 3414
> 


-- 

---------------------------------------------------------------
R. Baker Kearfott,    rbk@xxxxxxxxxxxxx   (337) 482-5346 (fax)
(337) 482-5270 (work)                     (337) 993-1827 (home)
URL: http://interval.louisiana.edu/kearfott.html
Department of Mathematics, University of Louisiana at Lafayette
(Room 217 Maxim D. Doucet Hall, 1403 Johnston Street)
Box 4-1010, Lafayette, LA 70504-1010, USA
---------------------------------------------------------------

Follow-Ups:
- Re: Another relevant consideration Re: Motion P1788/M0009.01_ExactDotProduct
  - From: James Demmel

References:
- Re: Motion P1788/M0009.01_ExactDotProduct
  - From: Corliss, George
- Re: Motion P1788/M0009.01_ExactDotProduct
  - From: Ralph Baker Kearfott
- RE: Motion P1788/M0009.01_ExactDotProduct
  - From: Michael Schulte
- Re: Motion P1788/M0009.01_ExactDotProduct
  - From: Ralph Baker Kearfott
- Re: Motion P1788/M0009.01_ExactDotProduct
  - From: James Demmel
- Re: Motion P1788/M0009.01_ExactDotProduct
  - From: Arnold Neumaier
- Another relevant consideration Re: Motion P1788/M0009.01_ExactDotProduct
  - From: Ralph Baker Kearfott
- Re: Another relevant consideration Re: Motion P1788/M0009.01_ExactDotProduct
  - From: Siegfried M. Rump
- Re: Another relevant consideration Re: Motion P1788/M0009.01_ExactDotProduct
  - From: James Demmel

Prev by Date: Re: Another relevant consideration Re: Motion P1788/M0009.01_ExactDotProduct
Next by Date: Re: Another relevant consideration Re: Motion P1788/M0009.01_ExactDotProduct
Previous by thread: Re: Another relevant consideration Re: Motion P1788/M0009.01_ExactDotProduct
Next by thread: Re: Another relevant consideration Re: Motion P1788/M0009.01_ExactDotProduct
Index(es):
- Date
- Thread