Re: ExactDotProduct
My point was that even a single hardware register is implicitly sorting,
bucket sorting
by exponents, because it needs to be able to add and possibly cancel
operands with
overlapping mantissas.
Imagine a list of summands
(x(1), -x(1), x(2), -x(2),....,x(n),-x(n), x(n+1))
whose sum is x(n+1), and where no two summands +-x(i) and +-x(j) overlap in
their mantissas unless i=j (for a given floating point format, this
limits n, but in my email
I said the algorithm had to be correct for any format). Suppose they are
supplied in some
scrambled order. Then to get the correct sum you need to find the summands
exceeding x(n+1) in magnitude, and realize that they cancel, and find
the summands
smaller than x(n+1) in magnitude, and realize that they do not matter.
This seems
tantamount to sorting, at least from a theoretical point of view.
Jim Demmel
Ulrich Kulisch wrote:
Jim Demmel wrote:
> Though I have not written this down formally, I think that an
algorithm
> that is correct for any underlying number of mantissa and exponent
bits
> must in effect do as much work as sorting (by the exponents).
There are plenty of sketches for hardware circuitries of the exact dot
product in my book ([9] in the proposal). Non of them uses sorting.
The result is independent of the sequence in which the summands are
added. All these circuits are simple and extremely fast (like vector
operations on conventional vector processors). Of course, these
techniques can also be implemented in software.
Best regards
Ulrich Kulisch