Dear Ian,
I thank you very much for your mail. It is very intersting and I am
happy to have it.
But I feel that we are talking about two different things. You are
talking about computing the dot product (exactly, simply, or fast)
on today's (existing) hardware, while I am talking about computing
an exact dot product on a single (perhaps not yet existing)
processor. I am sure that my method also could be used on parallel
machines with a speed up that perhaps is a little lower than yours.
Let me comment a little on the two methods. The fast method you
describe, if I understand it correctly, alters the sequence of the
summands. This was also done by the old vector units (of IBM, CRAY,
Fujitsu, Hitachi, or NEC). In an average application this may lead
to a reasonable result. But I have a large collection of examples
for all these vector units where the computed result does not
contain a single correct digit and if you know how the summands are
exchanged it is easy to find such examples.
I need the dot product for implementing interval arithmetic of
dynamic precision, of varible word lenghts. In interval arithmetic I
have to deliver a guaranteed answer. So I can not use an approximate
dot product, I need the dot product exactly. We have implemented the
exact dot product as a coprocessor for the PC in the 1990s, and we
have done it with IBM for their /370 architecture in the 1980s (see
reference [13] in motion 9). It was later copied by Siemens and
Hitachi also for their /370 architecture. By pipelining you can get
it with a similar speed up than the approximate dot products on the
old vector units.
For interval evaluation of an algorithm (a sequence of arithmetic
operations) in the real number field an old result of interval
arithmetic says roughly that increasing the precision by k digits
reduces the error bounds by b-k, i.e., results can always
be guaranteed to a number of correct digits by using variable
precision interval arithmetic. All intervallers dream of this kind
of applications for half a century. With today's technology they are
in immediate reach. The exact dot product is the central building
block for it and I think it is a must that IEEE P1788 requests it.
Best regards
Ulrich
Am 09.02.2012 04:24, schrieb Ian McIntosh:
UK> Actually the simplest and fastest way for computing a dot product is to compute it
exactly! By pipelining, it can be computed in the time the
processor needs to read the data, i.e., it comes with utmost
speed.
There are three ways to calculate dot product: exact, simple, or
fast.
Computing a dot product either exactly or simply is generally
not the fastest way on today's hardware. There is a faster way
on any system that takes more than one cycle to calculate
sum+a*b, or that has multiple floating point pipelines
(including vector register operations, multiple CPUs or multiple
cores), and that can also transfer data quickly enough.
For a simple example, assume just one pipeline and using an fma
to calculate sum=sum+a*b in one operation, but taking 2 cycles
to produce the result. You can split the vectors in half,
compute the dot products of each half, and add them together at
the end. By interleaving those one cycle apart you almost double
the speed.
For another simple example, assume two pipelines each able to
calculate sum=sum+a*b in one fma operation, each taking only 1
cycle to produce the result. Change the calculation the same way
as above and you will double the speed.
If the original program is
sum = 0.;
for (i = 0; i < n; i = i+1)
{
sum = sum + a[i]*b[i];
}
one way is to change it to
sum1 = 0.; sum2 = 0.; /* done in parallel */
for (i = 0; i+1 < n; i = i+2)
{
sum1 = sum1 + a[i]*b[i]; sum2 = sum2 + a[i+1]*b[i+1]; /* done in
parallel */
}
sum = sum1 + sum2;
if ((n % 2) != 0)
{
sum = sum + a[n-1]*b[n-1];
}
Extending this, it's common on today's CPUs to compute dot
products almost 8 or 16 or 32 times as fast as the
straightforward way. On highly parallel systems with enough data
it can be thousands of times faster. The catch is that due to a
different order of rounding, the result may differ. The IBM XL
compilers and others do this transformation automatically when
the compiler optimization options allow it.
One reason Fortran has a dotproduct function is to explicitly
allow this kind of performance boost even without compiler
optimization when the user doesn't mind minor differences in the
result in exchange for an order of magnitude speedup. The fact
that in Fortran the fast function is named just "dotproduct" is
one reason I'd prefer using exactdotproduct for the exact
version; the other is that I dislike ambiguity.
This doesn't rule out constructing computers with fast parallel
dot product hardware, but AFAIK they don't exist yet.
- Ian McIntosh IBM Canada Lab Compiler Back End Support and
Development
Ulrich
Kulisch ---02/07/2012 01:44:05 PM---Am 07.02.2012 13:08,
schrieb John Pryce: > Ulrich, Vincent

From: |

Ulrich Kulisch
<Ulrich.Kulisch@xxxxxxxxxxx> |

To: |

Ian McIntosh/Toronto/IBM@IBMCA |

Date: |

02/07/2012 01:44 PM |

Subject: |

Re: Motion 31: V04.2 Revision of proposed
Level 1 text |
Am 07.02.2012 13:08, schrieb John Pryce:
Ulrich, Vincent
On 26 Jan 2012, at 13:02, Vincent Lefevre wrote:
On 2012-01-25 12:00:38 +0000, John
Pryce wrote:
I send herewith a revised draft of
the proposed Level 1 text, less
decorations, that I originally circulated on 5 Dec
2011.
About the basic arithmetic operations:
...
But why is the dot product included in the basic
arithmetic operations?
I don't see it as important as +, -, *, /. And not
more important than
recip, sqr, case, pown, abs, and perhaps some other
operations.
Ulrich: on reflection I agree with Vincent. However useful
the dot product is, it is essentially more complicated
than +, -, *, /. I think calling it "basic" will cause
more disagreement than agreement, so I have removed it
from those lists.
John P
John,
I totally disagree with this decision. I am very busy with other
things right now. But let me simply remind you that the IFIP
Working Group on Numerical Software required the exact dot
product by a letter to the standards committee IEEE 754R (dated
Sept. 4, 2007) and by another letter to IEEE P1788 (dated Sept.
9, 2009). I attach copies of these letters to this mail.
I also disagree with the statement that the exact dot product is
essentially more complicated than +, -, *, /.
Actually the simplest and fastest way for computing a dot product is to compute it
exactly! By pipelining, it can be computed in the time the
processor needs to read the data, i.e., it comes with utmost
speed. No software simulation can compete with a simple and
direct hardware solution. I attach a copy of the poster that I
prepared for the SCAN-Meeting at Lyon in 2010.
let me finally mention that the following is shown in my book
'Computer Arithmetic and Validity', de Gruyter 2008. With the
two requirements of the IFIP Working Group letter to IEEE 754R:
Fast hardware support for interval arithmetic and the exact dot
product, all operations in the usual product spaces of
computation, the complex numbers, the real and complex
intervals, the real and complex vectors and matrices, and the
real and complex interval vectors and interval matrices can be
computed with least bit accuracy and at very high speed. These
operations are distinctly different from those traditionally
available on computers. This would boost both the speed of a
computation and the accuracy of its result.
Best regards
Ulrich
--
Karlsruher Institut für Technologie (KIT)
Institut für Angewandte und Numerische Mathematik (IANM2)
D-76128 Karlsruhe, Germany
Prof. Ulrich Kulisch
Telefon: +49 721 608-42680
Fax: +49 721 608-46679
E-Mail: ulrich.kulisch@xxxxxxx
www.kit.edu
www.math.kit.edu/ianm2/~kulisch/
KIT - Universität des Landes Baden-Württemberg und nationales
Großforschungszentrum in der Helmholtz-Gemeinschaft
[attachment "IFIPWG-IEEE754R.pdf" deleted by Ian
McIntosh/Toronto/IBM] [attachment "IFIPWG-IEEE-P1788.pdf"
deleted by Ian McIntosh/Toronto/IBM] [attachment
"Poster22.pdf" deleted by Ian McIntosh/Toronto/IBM]
--
Karlsruher Institut für Technologie (KIT)
Institut für Angewandte und Numerische Mathematik (IANM2)
D-76128 Karlsruhe, Germany
Prof. Ulrich Kulisch
Telefon: +49 721 608-42680
Fax: +49 721 608-46679
E-Mail: ulrich.kulisch@xxxxxxx
www.kit.edu
www.math.kit.edu/ianm2/~kulisch/
KIT - Universität des Landes Baden-Württemberg und nationales Großforschungszentrum in der Helmholtz-Gemeinschaft
|