Re: ExactDotProduct
James Demmel wrote:
> My point was that even a single hardware register is implicitly sorting,
> bucket sorting by exponents, because it needs to be able to add and
> possibly cancel operands with overlapping mantissas.
Ulrich's point was that a hardware implemention simply uses a subset
of the exponent bits to index into the correct bucket, so to speak.
Carries may of course propagate into higher-exponent buckets, and in
order to reduce carry propagation, one would use two accumuluators
in practice, summing positive and negative terms separately, and
doing one big subtraction at the end. In a parallell environment
one might use several accumulator pairs.
The hardware primitive would be shift-and-add (to position the addend
relative to the 64-bit buckets, say).
Michel.
---Sent: 2009-11-03 20:06:08 UTC