Re: discussion period begins, until Jan. 26: "natural interval extension

Thread Links	Date Links
Thread Prev	Thread Next	Thread Index	Date Prev	Date Next	Date Index

Thread Links

Date Links

Thread Prev

Dear David,

Am 26.01.2016 um 15:26 schrieb David Lester:

Dear Ulrich,

You appear to have forgotten our previous discussion.

What you are actually trading is latency on interrupts vs speed on

a highly specialised bit of hardware.

If the EDP is to be a general operation it has to be possible

for the code to be interrupted. If this code is to be re-entrant

then the entire 1024 bit accumulator(s?) have to be flushed

to interrupt stack.

In my book Computer Arithmetic and Validity, the possibility of interrupting a dot product computation is considered. See Figures 8.17 and 8.18 in the second edition and the text around these figures.

However, I am of the opinion that a dot product computation never should and never needs to be interrupted. I repeat from my mail below:

The simplest and fastest way for computing a dot product is to compute it exactly. By pipelining, it can be computed in the time the processor needs to read the data, i.e., it comes with utmost speed.

So the question is: would you interrupt reading the data into the processor. I think if another computation really has higher priority (what I doubt) it would be better to place the interrupt before the dot product operation.

See another remark below.

While we are discussing pre-historic processors (Intel? Really?)

the reason that RISC is better than CISC is that small instructions

are more easily interruptible. This — if you recall — was the

problem with the RET instruction on the VAX-780 (where it was actually

more efficient to insert the register saves yourself, rather than use

the RET instruction) which for real-time responsiveness needed to

be interruptible.

Similarly, the students at Berkeley certain showed that _considered_

_as_a_non-interruptible_instruction_ the EDP was 6 times more

efficient than an Intel processor (or as we say in Manchester, twice as

efficient as a modern processor). But the paper was a little vague

on how this meshes with the rest of the instruction set.

(I should say, that getting the interrupts to work properly is what

keeps processor architects awake at night. I have some horror

stories from the MIPS architecture.)

On the matter of modern supercomputer design, it is not the

case that we have silicon to burn. The major issue is power

consumption, and how to get the heat out afterwards. An

Exascale supercomputer could be built now from current

technology. The problems would be: bisection bandwidth

(basically how efficiently data could be moved around the

machine), and the inability of anyone to pay the electricity

bills (order US $100M pa)!

The other issue is the expected instruction mix we

should expect. The vast majority of supercomputer use

today is concerned with “big data” and not numeric

simulation. My friends at FZJ Julich estimate that less

than 1% of the instructions executed on the JUQUEEN

during a numeric simulation are spent doing FP

instructions.

http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JUQUEEN/JUQUEEN_node.html

In short a modern supercomputer is more often than not

just a data-mining tool, not the sort of number cruncher

that Cray would have produced.

Anyway, wouldn’t an EDP instruction requirement fit more

naturally into IEEE-754 if it’s needed at all?

Yes, it was required in IEEE 754R and found the support of Dan Zuras. But the majority of IEEE 754R members did not know how to do it efficiently at that time. So the reduction operations were put in instead.

Regrettably IEEE 1788 seems to make the same mistake nine years after that.

I wonder why Dan Zuras does not contribute to the discussion anymore.

Best regards
Ulrich

Dave

On 26 Jan 2016, at 12:45, Ulrich Kulisch <ulrich.kulisch@xxxxxxx> wrote:

This discussion gives speed a higher priority than accuracy. But it may lead to a wrong result. From the mathematical point of view accuracy must have the higher priority. In the end this leads to the faster solution. No error analysis is necessary.

In Numerical Analysis the dot product is ubiquitous. It is not merely a fundamental operation in all vector and matrix spaces. It is the exact dot product which makes residual correction or iterative refinement effective.

The simplest and fastest way for computing a dot product is to compute it exactly. By pipelining, it can be computed in the time the processor needs to read the data, i.e., it comes with utmost speed. Pipelining is another way of parallel processing.

Recently two students at Berkeley have shown that the EDP can be computed in 1/6 of the time the Intel processor needs to compute a possibly wrong result in conventional floating-point arithmetic.The hardware needed for the EDP is comparable to that for a fast multiplier by an adder tree, accepted years ago and now standard technology in every modern processor. The exact dot product brings the same speedup for accumulations at comparable costs.

Best regards
Ulrich

-- 
Karlsruher Institut für Technologie (KIT)
Institut für Angewandte und Numerische Mathematik
D-76128 Karlsruhe, Germany
Prof. Ulrich Kulisch
KIT Distinguished Senior Fellow

Telefon: +49 721 608-42680
Fax: +49 721 608-46679
E-Mail: ulrich.kulisch@xxxxxxx
www.kit.edu
www.math.kit.edu/ianm2/~kulisch/

KIT - Universität des Landes Baden-Württemberg 
und nationales Großforschungszentrum in der 
Helmholtz-Gesellschaft

Re: discussion period begins, until Jan. 26: "natural interval extension": friendly amendment to M001.02