Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: discussion period begins, until Jan. 26: "natural interval extension": friendly amendment to M001.02



Dear David,

Am 26.01.2016 um 15:26 schrieb David Lester:
Dear Ulrich,

You appear to have forgotten our previous discussion.

What you are actually trading is latency on interrupts vs speed on 
a highly specialised bit of hardware.

If the EDP is to be a general operation it has to be possible
for the code to be interrupted. If this code is to be re-entrant
then the entire 1024 bit accumulator(s?) have to be flushed
to interrupt stack.

In my book Computer Arithmetic and Validity, the possibility of interrupting a dot product computation is considered. See Figures 8.17 and 8.18 in the second edition and the text around these figures.

However, I am of the opinion that a dot product computation never should and never needs to be interrupted. I repeat from my mail below:

The simplest and fastest way for computing a dot product is to compute it exactly. By pipelining, it can be computed in the time the processor needs to read the data, i.e., it comes with utmost speed. 

So the question is: would you interrupt reading the data into the processor. I think if another computation really has higher priority (what I doubt) it would be better to place the interrupt before the dot product operation.

See another remark below.

While we are discussing pre-historic processors (Intel? Really?)
the reason that RISC is better than CISC is that small instructions
are more easily interruptible. This — if you recall — was the
problem with the RET instruction on the VAX-780 (where it was actually
more efficient to insert the register saves yourself, rather than use
the RET instruction) which for real-time responsiveness needed to
be interruptible.

Similarly, the students at Berkeley certain showed that _considered_
_as_a_non-interruptible_instruction_ the EDP was 6 times more
efficient than an Intel processor (or as we say in Manchester, twice as
efficient as a modern processor). But the paper was a little vague
on how this meshes with the rest of the instruction set.

(I should say, that getting the interrupts to work properly is what
keeps processor architects awake at night. I have some horror
stories from the MIPS architecture.)

On the matter of modern supercomputer design, it is not the
case that we have silicon to burn. The major issue is power
consumption, and how to get the heat out afterwards. An
Exascale supercomputer could be built now from current
technology. The problems would be: bisection bandwidth
(basically how efficiently data could be moved around the
machine), and the inability of anyone to pay the electricity
bills (order US $100M pa)!

The other issue is the expected instruction mix we
should expect. The vast majority of supercomputer use
today is concerned with “big data” and not numeric
simulation. My friends at FZJ Julich estimate that less
than 1% of the instructions executed on the JUQUEEN
during a numeric simulation are spent doing FP
instructions.


In short a modern supercomputer is more often than not
just a data-mining tool, not the sort of number cruncher
that Cray would have produced.

Anyway, wouldn’t an EDP instruction requirement fit more
naturally into IEEE-754 if it’s needed at all?

Yes, it was required in IEEE 754R and found the support of Dan Zuras. But the majority of IEEE 754R members did not know how to do it efficiently at that time. So the reduction operations were put in instead. 

Regrettably IEEE 1788 seems to make the same mistake nine years after that.

I wonder why Dan Zuras does not contribute to the discussion anymore.

Best regards
Ulrich

Dave


On 26 Jan 2016, at 12:45, Ulrich Kulisch <ulrich.kulisch@xxxxxxx> wrote:

This discussion gives speed a higher priority than accuracy. But it may lead to a wrong result. From the mathematical point of view accuracy must have the higher priority. In the end this leads to the faster solution. No error analysis is necessary.
 
In Numerical Analysis the dot product is ubiquitous. It is not merely a fundamental operation in all vector and matrix spaces. It is the exact dot product which makes residual correction or iterative refinement effective. 
 
The simplest and fastest way for computing a dot product is to compute it exactly. By pipelining, it can be computed in the time the processor needs to read the data, i.e., it comes with utmost speed. Pipelining is another way of parallel processing. 

Recently two students at Berkeley have shown that the EDP can be computed in 1/6 of the time the Intel processor needs to compute a possibly wrong result in conventional floating-point arithmetic.The hardware needed for the EDP is comparable to that for a fast multiplier by an adder tree, accepted years ago and now standard technology in every modern processor. The exact dot product brings the same speedup for accumulations at comparable costs. 

Best regards
Ulrich


-- 
Karlsruher Institut für Technologie (KIT)
Institut für Angewandte und Numerische Mathematik
D-76128 Karlsruhe, Germany
Prof. Ulrich Kulisch
KIT Distinguished Senior Fellow

Telefon: +49 721 608-42680
Fax: +49 721 608-46679
E-Mail: ulrich.kulisch@xxxxxxx
www.kit.edu
www.math.kit.edu/ianm2/~kulisch/

KIT - Universität des Landes Baden-Württemberg 
und nationales Großforschungszentrum in der 
Helmholtz-Gesellschaft