Thread Links			Date Links
Thread Prev	Thread Next	Thread Index	Date Prev	Date Next	Date Index

Re: Please listen to Ulrich here...

To: stds-1788@xxxxxxxxxxxxxxxxx
Subject: Re: Please listen to Ulrich here...
From: Ian McIntosh <ianm@xxxxxxxxxx>
Date: Fri, 23 Aug 2013 11:19:25 -0400
Delivered-to: mhonarc@xxxxxxxxxxxxxxxx
In-reply-to: <5217373B.3050009@kit.edu>
List-help: <https://listserv.ieee.org/cgi-bin/wa?LIST=STDS-1788>, <mailto:LISTSERV@LISTSERV.IEEE.ORG?body=INFO%20STDS-1788>
List-owner: <mailto:STDS-1788-request@LISTSERV.IEEE.ORG>
List-subscribe: <mailto:STDS-1788-subscribe-request@LISTSERV.IEEE.ORG>
List-unsubscribe: <mailto:STDS-1788-unsubscribe-request@LISTSERV.IEEE.ORG>
References: <20130807092428.85A4D228EDF8@zuras.net> <52026FC6.4000306@walster.net> <5203D0AE.6080809@walster.net> <5212F5B5.30300@kit.edu> <52138EB5.1050009@walster.net> <201308201721.r7KHLTwI010457@d01av02.pok.ibm.com> <5217373B.3050009@kit.edu>
Sender: stds-1788@xxxxxxxx

I'd consider voting yes to requiring EDP but no to requiring (instead of recommending) CA. There are other ways to calculate EDP so why require CA?

If I want high precision calculations, the first thing I'd do is use quad not double precision, and quad precision makes CA less practical, needing 8 times the long register size due to 3 more exponent bits.

CA's long register has to be zeroed at the start and converted to a floating point value at the end and that takes time. That wouldn't matter with long vectors but could dominate the time with short ones.

CA might be the fastest way to compute an EDP, but it isn't necessarily the fastest way to compute a DP. Making CA handle one add to the long register per cycle may be challenging. If the long register is a single register it will take time to shift the multiply result to the right position then time for carries to propagate, potentially a long way. If the long register is implemented as separate registers, one for each group of let's say 64 exponents, then first the right two (or occasionally one) registers must be read, added to and rewritten, with the possibility of a carry to the left potentially through many registers. If the next value has a similar exponent then it will need the same registers, either by rereading them or via a bypass. Doing it in one cycle at today's clock rates will get complicated.

For those who want speed, many of today's computers have parallel pipelines and vector registers and can calculate 2, 4, 8, 16 or more partial dot products in parallel, to be summed at the end. That's at least competitive with CA and can be faster with no special hardware, although it wouldn't be exact. For double precision it can be made more accurate with only a constant time slowdown by doing the calculations in quad precision. If the exponent range was small it could often produce the same or very close to the same result as EDP.

On the other side, unless the dot product really is exact (rounded to floating point) the error is unknown and creating an interval from it is problematic. EDP can be useful or even essential. CA can be useful; I'm just not convinced it's essential.

- Ian McIntosh IBM Canada Lab Compiler Back End Support and Development

Ulrich Kulisch ---08/23/2013 06:19:51 AM---If you design a chip for a particular application where you need a multiplier the design tool provi

From:

Ulrich Kulisch <ulrich.kulisch@xxxxxxx>

To:

Ian McIntosh/Toronto/IBM@IBMCA

Date:

08/23/2013 06:19 AM

Subject:

Re: Please listen to Ulrich here...

If you design a chip for a particular application where you need a multiplier the design tool provides already an adder tree for fast multiplication. It is already predesigned. I never heard people complain that there are other simpler ways implementing a multiplier which need less silicon. Multiplication is a basic arithmetic operation and a certain "waste of silicon" is accepted to get it fast.

Now the dot product is a fundamental operation in the vector and matrix spaces and it appears again and again in numerical computations and in mathematics. The assertion is: The simplest and fastest way computing a dot product (of floating-point numbers) is to compute it exactly. It just needs multiplication, shift and add, and a small amount of local memory on the arithmetic unit (which needs less silicon than an adder tree for fast multiplication). (No normalizations and roundings after multiplications and additions, no storing and reading of intermediate results. By pipelining it can be done in the time the processor needs to read the data, i.e., it comes with utmost speed. A possible carry can be absorbed right away, it does not need additional computing time). With the exact dot product you have a correctly or otherwise rounded dot product, of course.
In contrast to multiplication complaints are coming here: this is a waste of silicon and there may be other perhaps simpler ways of computing a dot product, or why do we need an exact dot product for data which frequently are measured and not exact.

At the SCAN2000 conference at Karlsruhe Bill Walster gave an invited talk and I cite a nice paragraph from his lecture concerning interval arithmetic:

Because intervals introduce a "new order of things", we must
".... bear in mind that there is nothing more difficult to excute, nor more dubious for success, nor more dangerous to administer than to introduce a new order of things; for he who introduces it has all those who profit from the old order as his enemies, and he has only lukewarm allies in all those who might profit from the new."
The Prince by Niccolo Machiavelli. First published in 1532, several years after his death.

If you in the first line of this citation replace the word "intervals" by "exact dot product" you have a nice description of the ongoing discussion on the subject.

By not requiring an EDP we give up a unique chance of improving our computing tool. So please vote YES on motion 47.

With best regards
Ulrich Kulisch

-- Karlsruher Institut für Technologie (KIT) Institut für Angewandte und Numerische Mathematik D-76128 Karlsruhe, Germany Prof. Ulrich Kulisch Telefon: +49 721 608-42680 Fax: +49 721 608-46679 E-Mail:ulrich.kulisch@xxxxxxxwww.kit.eduwww.math.kit.edu/ianm2/~kulisch/KIT - Universität des Landes Baden-Württemberg und nationales Großforschungszentrum in der Helmholtz-Gesellschaft

Follow-Ups:
- Re: Please listen to Ulrich here...
  - From: Ulrich Kulisch

References:
- Please listen to Ulrich here...
  - From: Dan Zuras Intervals
- Re: Please listen to Ulrich here...
  - From: G. William (Bill) Walster
- Re: Please listen to Ulrich here...
  - From: G. William (Bill) Walster
- Re: Please listen to Ulrich here...
  - From: Ulrich Kulisch
- Re: Please listen to Ulrich here...
  - From: G. William (Bill) Walster
- Re: Please listen to Ulrich here...
  - From: Michel Hack
- Re: Please listen to Ulrich here...
  - From: Ulrich Kulisch

Prev by Date: Re: Please listen to Ulrich here...
Next by Date: Re: Motion P1788/M0047:Motion45Amendment-1: No
Previous by thread: Re: Please listen to Ulrich here...
Next by thread: Re: Please listen to Ulrich here...
Index(es):
- Date
- Thread