Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: ExactDotProduct



My point was that even a single hardware register is implicitly sorting, bucket sorting by exponents, because it needs to be able to add and possibly cancel operands with
overlapping mantissas.

Imagine a list of summands
    (x(1), -x(1), x(2), -x(2),....,x(n),-x(n), x(n+1))
whose sum is x(n+1), and where no two summands +-x(i) and +-x(j) overlap in
their mantissas unless i=j (for a given floating point format, this limits n, but in my email I said the algorithm had to be correct for any format). Suppose they are supplied in some
scrambled order. Then to get the correct sum you need to find the summands
exceeding x(n+1) in magnitude, and realize that they cancel, and find the summands smaller than x(n+1) in magnitude, and realize that they do not matter. This seems
tantamount to sorting, at least from a theoretical point of view.
Jim Demmel

Ulrich Kulisch wrote:
Jim Demmel wrote:

> Though I have not written this down formally, I think that an algorithm > that is correct for any underlying number of mantissa and exponent bits
> must in effect do as much work as sorting (by the exponents).
There are plenty of sketches for hardware circuitries of the exact dot product in my book ([9] in the proposal). Non of them uses sorting. The result is independent of the sequence in which the summands are added. All these circuits are simple and extremely fast (like vector operations on conventional vector processors). Of course, these techniques can also be implemented in software.

Best regards
Ulrich Kulisch