Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: discussion period begins, until Jan. 26: "natural interval extension": friendly amendment to M001.02



Ulrich,

If I understand what you have previously written, what you envisage is that
an EDP for an arbitrary length pair of vectors will be executed to completion
without interrupts. This makes the execution of real-time devices impossible.
It also messes up the OS rendering of the screen on your desk-top/lap-top.

So, what I think you want is that there is the possibility of interrupts after
each pair of float/double data reads. In which case, for a  general purpose
supercomputer, the long accumulator needs to be flush-able so that the
Operating System scheduler can schedule someone else’s EDP-ridden
program. And so forth. And because there will always be only a limited
number of long accumulators on a processor, this will inevitably — in
the worst case — cause flushing to main-memory.

A typical minimal modern processor (ARM6) has only 35,000 transistors.
Each bit of SRAM (your long accumulator) has 6 transistors. Thus just six of
your long accumulators will _double_ the size of the core’s foot-print.
Personally I’d go with the idea of using all that extra SRAM in a more
general way as caches or scratchpad, but that’s just me.

The alternative is that we are considering a specialised piece of single-user
hardware which is there only for Matrix/Vector processing. If that’s the case
then by all means build it — it won’t cost much, say $0.5-1M. A sort of
bolt-on hardware accelerator, as it were.

But I’m struggling to see the usefulness of EDP in a standard for general
purpose processors.

Dave Lester


On 27 Jan 2016, at 09:45, Ulrich Kulisch <ulrich.kulisch@xxxxxxx> wrote:

Dear David,

Am 26.01.2016 um 15:26 schrieb David Lester:
Dear Ulrich,

You appear to have forgotten our previous discussion.

What you are actually trading is latency on interrupts vs speed on 
a highly specialised bit of hardware.

If the EDP is to be a general operation it has to be possible
for the code to be interrupted. If this code is to be re-entrant
then the entire 1024 bit accumulator(s?) have to be flushed
to interrupt stack.

In my book Computer Arithmetic and Validity, the possibility of interrupting a dot product computation is considered. See Figures 8.17 and 8.18 in the second edition and the text around these figures.

However, I am of the opinion that a dot product computation never should and never needs to be interrupted. I repeat from my mail below:

The simplest and fastest way for computing a dot product is to compute it exactly. By pipelining, it can be computed in the time the processor needs to read the data, i.e., it comes with utmost speed.  

So the question is: would you interrupt reading the data into the processor. I think if another computation really has higher priority (what I doubt) it would be better to place the interrupt before the dot product operation.