Re: discussion period begins, until Jan. 26: "natural interval extension

Thread Links	Date Links
Thread Prev	Thread Next	Thread Index	Date Prev	Date Next	Date Index

Thread Links

Date Links

(I'm one of the Berkeley students that implemented the hardware EDP accelerator)

It seems to me that an interruptable EDP is absolutely necessary; as you point out we definitely need to allow the OS to preempt the currently running thread, and moreover loads/stores generated by the EDP unit itself could generate exceptions of their own. I don't think it is particularly difficult to enable restartable or precise exception behavior in the unit -- yes this might require writing out the state of the accumulator to memory, but this is something we already need to support for the rest of the architectural state of the machine. For more aggressive machines with vector or wide packed-SIMD extensions, the state of the register files is much greater than that of the accumulator. We just need to handle the accumulator as we would vector register state.

Indeed, i don't think there will be a place for EDP in small cores (ex. the 35K transistor count cores you mentioned) -- however these microcontrollers often lack even a floating point unit (for precisely the reason you gave).

- D

On Wed, Jan 27, 2016 at 3:25 AM, David Lester <dlester@xxxxxxxxxxxx> wrote:

Ulrich,

If I understand what you have previously written, what you envisage is that

an EDP for an arbitrary length pair of vectors will be executed to completion

without interrupts. This makes the execution of real-time devices impossible.

It also messes up the OS rendering of the screen on your desk-top/lap-top.

So, what I think you want is that there is the possibility of interrupts after

each pair of float/double data reads. In which case, for a general purpose

supercomputer, the long accumulator needs to be flush-able so that the

Operating System scheduler can schedule someone else’s EDP-ridden

program. And so forth. And because there will always be only a limited

number of long accumulators on a processor, this will inevitably — in

the worst case — cause flushing to main-memory.

A typical minimal modern processor (ARM6) has only 35,000 transistors.

Each bit of SRAM (your long accumulator) has 6 transistors. Thus just six of

your long accumulators will _double_ the size of the core’s foot-print.

Personally I’d go with the idea of using all that extra SRAM in a more

general way as caches or scratchpad, but that’s just me.

The alternative is that we are considering a specialised piece of single-user

hardware which is there only for Matrix/Vector processing. If that’s the case

then by all means build it — it won’t cost much, say $0.5-1M. A sort of

bolt-on hardware accelerator, as it were.

But I’m struggling to see the usefulness of EDP in a standard for general

purpose processors.

Dave Lester

On 27 Jan 2016, at 09:45, Ulrich Kulisch <ulrich.kulisch@xxxxxxx> wrote:

Dear David,

Am 26.01.2016 um 15:26 schrieb David Lester:

Dear Ulrich,

You appear to have forgotten our previous discussion.

What you are actually trading is latency on interrupts vs speed on

a highly specialised bit of hardware.

If the EDP is to be a general operation it has to be possible

for the code to be interrupted. If this code is to be re-entrant

then the entire 1024 bit accumulator(s?) have to be flushed

to interrupt stack.

In my book Computer Arithmetic and Validity, the possibility of interrupting a dot product computation is considered. See Figures 8.17 and 8.18 in the second edition and the text around these figures.

However, I am of the opinion that a dot product computation never should and never needs to be interrupted. I repeat from my mail below:

The simplest and fastest way for computing a dot product is to compute it exactly. By pipelining, it can be computed in the time the processor needs to read the data, i.e., it comes with utmost speed.

So the question is: would you interrupt reading the data into the processor. I think if another computation really has higher priority (what I doubt) it would be better to place the interrupt before the dot product operation.

-- Karlsruher Institut für Technologie (KIT) Institut für Angewandte und Numerische Mathematik D-76128 Karlsruhe, Germany Prof. Ulrich Kulisch KIT Distinguished Senior Fellow Telefon: +49 721 608-42680 Fax: +49 721 608-46679 E-Mail: ulrich.kulisch@xxxxxxx www.kit.edu www.math.kit.edu/ianm2/~kulisch/ KIT - Universität des Landes Baden-Württemberg und nationales Großforschungszentrum in der Helmholtz-Gesellschaft

Re: discussion period begins, until Jan. 26: "natural interval extension": friendly amendment to M001.02