Thread Links			Date Links
Thread Prev	Thread Next	Thread Index	Date Prev	Date Next	Date Index

Motion 9 ExactDotProduct

To: stds-1788@xxxxxxxx
Subject: Motion 9 ExactDotProduct
From: Ulrich Kulisch <Ulrich.Kulisch@xxxxxxxxxxx>
Date: Wed, 16 Dec 2009 10:37:17 +0100
Delivered-to: mhonarc@xxxxxxxxxxxxxxxx
List-help: <http://listserv.ieee.org/cgi-bin/wa?LIST=STDS-1788>, <mailto:LISTSERV@LISTSERV.IEEE.ORG?body=INFO%20STDS-1788>
List-owner: <mailto:STDS-1788-request@LISTSERV.IEEE.ORG>
List-subscribe: <mailto:STDS-1788-subscribe-request@LISTSERV.IEEE.ORG>
List-unsubscribe: <mailto:STDS-1788-unsubscribe-request@LISTSERV.IEEE.ORG>
Sender: stds-1788@xxxxxxxx
User-agent: Thunderbird 2.0.0.23 (Windows/20090812)

Dear all:

Of course, I am very happy that Motion 9 passed.

But reading over again and again the comments that came with the 'NO'votes I am severely doubting whether Motion 9 was really understood.'Simplicity' is required. 'The standard should not be too heavy'.

Actually Motion 9 describes the simplest and fastest way for computing adot product. Take a chest of drawers with 67 numbered drawers. Each oneholds 64 bits. The exponent of the summand (the product) consists of 12bits. The leading 6 bits give the address of the three consecutivedrawers to which the summand of 106 bits is added. The low end 6 bits ofthe exponent are used for the correct positioning of the summand withinthe selected drawers. The addition affects at most 170 bits of thesedrawers, i.e., an adder of 170 bits could execute the addition in asingle add cycle.

A carry is absorbed by the next more significant drawer in which not allbits are 1. For fast detection of this word a flag is attached to eachdrawer. It is set 1 if all bits of the word are 1. This means that acarry will propagate through the entire word. As soon as the exponent ofthe summand is available the flags allow selecting and incrementing thecarry word. This can be done simultaneously with adding the summand intothe selected drawers. If the addition produces a carry the incrementedword is written into the carry word. Otherwise it is left as it was. Azero flag may serve the same purpose in case of subtraction.

There is indeed no simpler way of accumulating a dot product. Any methodthat just computes an approximation also has to consider the relativevalues of the summands. This results in a more complicated method.

The fascinating property of the exact dot product (Motion 9) is the factthat it can be computed with extreme speed, ideally in the time theprocessor needs to read the data. No special cases have to be dealtwith. The technique of adding a medium sized bit string to a very longone may have applications in other areas of computing as well.

All conventional vector processors provide a 'multiply and accumulate'operation (the dot product) to achieve high speed. In a pipeline thearithmetic (the multiplication and the accumulation) is done in the timethe processor needs to read the data. Very effective vectorizingcompilers have been developed that use this 'multiply and accumulate'operation within a user's program as often as possible, since thisgreatly speeds up program execution. However, the accumulation is donein floating-point arithmetic. The so-called 'partial sum technique'alters the sequence of the summands and causes errors in addition to theusual floating-point errors. The exact dot product avoids all numericalerrors and at the same speed. The hardware needed for it is comparableto that for a fast multiplier by a Wallace tree, accepted years ago. Inspeed a hardware implementation of the exact dot product exceedscomputing an accurately rounded dot product in software by severalorders of magnitudes.


Best wishes and a happy holiday season
Ulrich Kulisch

A disappointing feature is the failure of the numerical analysts toinfluence computer hardware and software in the way they should. It isoften said that the use of computers for scientific work represents asmall part of the market and numerical analysts have resigned themselvesto accepting facilities "designed" for other purposes and making thebest of them.

J. H. Wilkinson: Turing Lecture 1970, J. ACM 18 (1971), 146.

Prev by Date: Motion P1788/M0010.22: Elementary Functions: YES
Next by Date: Motion P1788/M0010.22: Elementary Functions: YES
Previous by thread: Re: Motion 9 ExactDotProduct
Next by thread: Motion P1788/M0009.01_ExactDotProduct: voting period officially begins
Index(es):
- Date
- Thread