Dear Ian,
thank you for writing. I think all the problems you see are
solved. I assumed that the reader of my mails is more or less
familiar with the contents of my book.
Computer Arithmetic and Validity - Theory, Implementation, and
Applications. De Gruyter 2008, second edition 2013. You probably
find it in your library.
I attach copies of a few pages. You should not be shocked by the
large exponent ranges of the IEEE data formats. See the section
"Hardware Accumulation Window" in my book. When we implemented
these ideas for IBM on /370 mainframes in the 1980ies we only used
about 1200 bits for the long accumulator and I am not aware of any
problem where this was not enough.
With best regards
Ulrich
Am 23.08.2013 17:19, schrieb Ian McIntosh:
I'd consider voting yes to
requiring EDP but no to requiring (instead of recommending)
CA. There are other ways to calculate EDP so why require CA?
If I want high precision
calculations, the first thing I'd do is use quad not double
precision, and quad precision makes CA less practical, needing
8 times the long register size due to 3 more exponent bits.
CA's long register has to be
zeroed at the start and converted to a floating point value
at the end and that takes time. That wouldn't matter with
long vectors but could dominate the time with short ones.
CA might be the fastest way to
compute an EDP, but it isn't necessarily the fastest way to
compute a DP. Making CA handle one add to the long register
per cycle may be challenging. If the long register is a
single register it will take time to shift the multiply result
to the right position then time for carries to propagate,
potentially a long way. If the long register is implemented
as separate registers, one for each group of let's say 64
exponents, then first the right two (or occasionally one)
registers must be read, added to and rewritten, with the
possibility of a carry to the left potentially through many
registers. If the next value has a similar exponent then it
will need the same registers, either by rereading them or via
a bypass. Doing it in one cycle at today's clock rates will
get complicated.
For those who want speed, many
of today's computers have parallel pipelines and vector
registers and can calculate 2, 4, 8, 16 or more partial dot
products in parallel, to be summed at the end. That's at
least competitive with CA and can be faster with no special
hardware, although it wouldn't be exact. For double precision
it can be made more accurate with only a constant time
slowdown by doing the calculations in quad precision. If the
exponent range was small it could often produce the same or
very close to the same result as EDP.
On the other side, unless the
dot product really is exact (rounded to floating point) the
error is unknown and creating an interval from it is
problematic. EDP can be useful or even essential. CA can be
useful; I'm just not convinced it's essential.
- Ian McIntosh IBM
Canada Lab Compiler Back End Support and Development
Ulrich Kulisch ---08/23/2013 06:19:51
AM---If you design a chip for a particular application where
you need a multiplier the design tool provi

|

Ulrich Kulisch
<ulrich.kulisch@xxxxxxx> |

|

Ian
McIntosh/Toronto/IBM@IBMCA |

|

08/23/2013 06:19 AM |

|

Re: Please listen to
Ulrich here... |
If you design a chip for a particular
application where you need a multiplier the design tool provides
already an adder tree for fast multiplication. It is already
predesigned. I never heard people complain that there are other
simpler ways implementing a multiplier which need less silicon.
Multiplication is a basic arithmetic operation and a certain
"waste of silicon" is accepted to get it fast.
Now the dot product is a fundamental operation in the vector and
matrix spaces and it appears again and again in numerical
computations and in mathematics. The assertion is: The simplest
and fastest way computing a dot product (of floating-point
numbers) is to compute it exactly. It just needs multiplication,
shift and add, and a small amount of local memory on the
arithmetic unit (which needs less silicon than an adder tree for
fast multiplication). (No
normalizations and roundings after multiplications and
additions, no storing and reading of intermediate results. By
pipelining it can be done in the time the processor needs to
read the data, i.e., it comes with utmost speed. A possible
carry can be absorbed right away, it does not need additional
computing time). With the
exact dot product you have a correctly or otherwise rounded dot
product, of course.
In contrast to multiplication complaints are coming here: this
is a waste of silicon and there may be other perhaps simpler
ways of computing a dot product, or why do we need an exact dot
product for data which frequently are measured and not exact.
At the SCAN2000 conference at Karlsruhe Bill Walster gave an
invited talk and I cite a nice paragraph from his lecture
concerning interval arithmetic:
Because intervals introduce a "new order of things", we must
".... bear in mind that there is nothing more difficult to
excute, nor more dubious for success, nor more dangerous to
administer than to introduce a new order of things; for he who
introduces it has all those who profit from the old order as his
enemies, and he has only lukewarm allies in all those who might
profit from the new."
The Prince by Niccolo
Machiavelli. First published in 1532, several years after his
death.
If you in the first line of this citation replace the word
"intervals" by "exact dot product" you have a nice description
of the ongoing discussion on the subject.
By not requiring an EDP we give up a unique chance of improving
our computing tool. So please vote YES on motion 47.
With best regards
Ulrich Kulisch
--
Karlsruher Institut für Technologie (KIT)
Institut für Angewandte und Numerische Mathematik
D-76128 Karlsruhe, Germany
Prof. Ulrich Kulisch
Telefon: +49 721 608-42680
Fax: +49 721 608-46679
E-Mail: ulrich.kulisch@xxxxxxx
www.kit.edu
www.math.kit.edu/ianm2/~kulisch/
KIT - Universität des Landes Baden-Württemberg
und nationales Großforschungszentrum in der
Helmholtz-Gesellschaft
--
Karlsruher Institut für Technologie (KIT)
Institut für Angewandte und Numerische Mathematik
D-76128 Karlsruhe, Germany
Prof. Ulrich Kulisch
Telefon: +49 721 608-42680
Fax: +49 721 608-46679
E-Mail: ulrich.kulisch@xxxxxxx
www.kit.edu
www.math.kit.edu/ianm2/~kulisch/
KIT - Universität des Landes Baden-Württemberg
und nationales Großforschungszentrum in der
Helmholtz-Gesellschaft
|