Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: ExactDotProduct



Michel Hack schrieb:
James Demmel wrote:
My point was that even a single hardware register is implicitly sorting,
bucket sorting by exponents, because it needs to be able to add and
possibly cancel operands with overlapping mantissas.

Ulrich's point was that a hardware implemention simply uses a subset
of the exponent bits to index into the correct bucket, so to speak.
Carries may of course propagate into higher-exponent buckets, and in
order to reduce carry propagation, one would use two accumuluators
in practice, summing positive and negative terms separately, and
doing one big subtraction at the end.  In a parallell environment
one might use several accumulator pairs.

Yes that is a good idea. We did it this way in an early implementation of our Pascal-XSC in 1980. But it doubles the necessary register space. It can be avoided. A very simple technique is now available to handle the carry propagation. See Section "8.4.4 Fast Carry Resolution" on page 261 of my book [9].

Ulrich
The hardware primitive would be shift-and-add (to position the addend
relative to the 64-bit buckets, say).

Michel.
---Sent: 2009-11-03 20:06:08 UTC