Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: Motion P1788/M0009.01_ExactDotProduct



There are a number of papers on accurate summations and dot products in software (the most recent paper I know of is by Zhu and Hayes in SISC, 2009), exhibiting a big design space with lots of tradeoffs in space, time and accuracy (correctly
rounded vs faithful vs at most a few ulps of error).

Though I have not written this down formally, I think that an algorithm that is correct for any underlying number of mantissa and exponent bits must in effect do as much work as sorting (by the exponents). This can be done in a big hardware register (basically bucket sort) or by distillation (bucket sort and merge sort have been proposed)
or by explicit sorting (as my algorithm with Hida).

Hopefully any statement of standard will not discourage implementors from
continuing to explore this space.

Jim Demmel


Ralph Baker Kearfott wrote:
Michael et al,

I'd like to remind people that Ulrich and others have argued in the
past that accurate dot product should be implemented in hardware because
it is otherwise too slow.  On the other hand, for what it's worth,
Rump, Oishi et al have
developed "almost accurate" dot product algorithms that can be implemented
efficiently in current IEEE-754 conforming hardware.

I'm not sure how all this should affect whether or not we require an
accurate dot product, but it's relevant.

Baker

Michael Schulte wrote:
George,
.
.
.

However, since std-1788 does not require that everything be implemented in hardware, I think we should include exact dot products in the standard and then people can decide if they want to implement it in hardware or software. My impression is that a software implementation of exact dot product would not be prohibitive (except possibly in some very cost-constrained embedded devices). Best regards,
Mike