Thread Links			Date Links
Thread Prev	Thread Next	Thread Index	Date Prev	Date Next	Date Index

Re: ExactDotProduct

To: Dan Zuras Intervals <intervals08@xxxxxxxxxxxxxx>
Subject: Re: ExactDotProduct
From: Ulrich Kulisch <Ulrich.Kulisch@xxxxxxxxxxx>
Date: Wed, 04 Nov 2009 11:26:34 +0100
Cc: Michel Hack <hack@xxxxxxxxxxxxxx>, stds-1788@xxxxxxxxxxxxxxxxx
Delivered-to: mhonarc@xxxxxxxxxxxxxxxx
In-reply-to: <20091103233613.A3FA2E14B6D@xxxxxxxxx>
List-help: <http://listserv.ieee.org/cgi-bin/wa?LIST=STDS-1788>, <mailto:LISTSERV@LISTSERV.IEEE.ORG?body=INFO%20STDS-1788>
List-owner: <mailto:STDS-1788-request@LISTSERV.IEEE.ORG>
List-subscribe: <mailto:STDS-1788-subscribe-request@LISTSERV.IEEE.ORG>
List-unsubscribe: <mailto:STDS-1788-unsubscribe-request@LISTSERV.IEEE.ORG>
References: <200911031006.nA3A6Ivf012500@xxxxxxxxxxxxxxxxxxx> <20091103233613.A3FA2E14B6D@xxxxxxxxx>
Sender: stds-1788@xxxxxxxx
User-agent: Thunderbird 2.0.0.23 (Windows/20090812)

Dan Zuras Intervals schrieb:

Date: Tue, 03 Nov 2009 14:57:50 -0500
To: stds-1788                           <stds-1788@xxxxxxxxxxxxxxxxx>
From: Michel Hack                                 <hack@xxxxxxxxxxxxxx>
Subject: Re: ExactDotProduct

James Demmel wrote:

My point was that even a single hardware register is implicitly sorting,
bucket sorting by exponents, because it needs to be able to add and
possibly cancel operands with overlapping mantissas.

Ulrich's point was that a hardware implemention simply uses a subset
of the exponent bits to index into the correct bucket, so to speak.
Carries may of course propagate into higher-exponent buckets, and in
order to reduce carry propagation, one would use two accumuluators
in practice, summing positive and negative terms separately, and
doing one big subtraction at the end.  In a parallell environment
one might use several accumulator pairs.

The hardware primitive would be shift-and-add (to position the addend
relative to the 64-bit buckets, say).

Michel.
---Sent: 2009-11-03 20:06:08 UTC


	Michel,

	I suspect Jim understands this point.

	On the contrary, Ulrich's approach uses a SUPER set of
	exponent bits to index into a table of registers (together
	with their carry bits) to implement what is known as a
	radix sort.  While this may be O(1) in time it is O(2^e)
	in space, where e is at least one more than your exponent
	bits.

	And the space involved is not memory.  It is registers.
	Many registers.  Many many registers.

	Even today when computers are cheap enough to throw away
	with your clothes, registers are expensive in the design
	of such machines.

Dan:

I agree that the use of what you call the 'super register' would be theideal and the fastest solution. To add a product to the 'super register'only two memory words must be read. The result of the accumulation ofthe product appears in the 'super register'.

The basic idea discussed here was realized on IBM, SIEMENS, and Hitachicomputers 25 years ago. These computers did not provide enough registerspace. So here the 'super register' was placed in the user memory. Thedisadvantage of this solution is that for each accumulation step, fourmemory words (the three words to which the product is added and the wordwhich absorbs the carry) must be read and written in addition to the twooperand loads. So the scalar product computation is slower than by useof a 'super register'. However, in practice today the necessary memoryspace would probably be in cache. So the loss in speed would bemarginal. Anyway, compared with any (even fast) software solution thegain in speed would still be tremendous.

Genarally, realization of Motion 9 probably will be architecturedependent. I have the highest opinion of the hardware designers and I amconvinced that they shall develop clever and intelligent ideas if Motion9 gets accepted.


Best regards
Ulrich

References:
- Re: ExactDotProduct
  - From: Michel Hack
- Re: ExactDotProduct
  - From: Dan Zuras Intervals

Prev by Date: Re: Motion P1788/M0009.01_ExactDotProduct YES
Next by Date: Re: Motion 10v2 released
Previous by thread: Re: ExactDotProduct
Next by thread: Re: ExactDotProduct
Index(es):
- Date
- Thread