Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: Please listen to Ulrich here...



Dear David,

thank you for your interesting mail. I fully agree with you that most computer applications do not need floating-point arithmetic and can sufficiently be dealt with more energy efficient processors than those used in a PC today. Of course, these simpler processors also can be used for floating-point arithmetic and scientific computing including interval arithmetic. In the late 1970ies we implemented our PASCAL-XSC on the Z80 which just provided an 8 bit adder. I recommend everybody to have a look at this language! It was one of the more powerful programming languages that was available for scientific computing. But running it on the Z80 certainly was not energy efficient. So the question remains how scientific computing can be made more energy efficient.


< The problem as I see it is that Ulrich is designing a machine for the 1980s, and utterly failing to address the concerns of
< anyone actually designing or building a machine today. And that concern is energy.

My understanding of the situation is not so much influenced by the computer of the 1980ies but by those that were used 50 years earlier. The better computers used in the 1930ies and earlier provided a long accumulator. It allowed error free accumulation of numbers and of products of numbers into a wide fixed-point register. Several of these computers provided more than one such register. This method is simpler and more energy efficient than doing a dot product in the conventional way in floating-point arithmetic. No intermediate normalizations and roundings have to be performed. No intermediate results have to be stored and read in again for the next operation. No intermediate overflow and underflow can occur and has to be checked for. No error analysis is necessary. The result is always exact. The operations are just: multiply, shift and add.    
With best regards
Ulrich




Am 08.08.2013 11:41, schrieb David Lester:
As a supercomputer designer http://apt.cs.man.ac.uk/projects/SpiNNaker/SpiNNchip/, with a new
€1 billion funding stream as part of http://www.humanbrainproject.eu/, I had been intending to
sit this one out.

The problem as I see it is that Ulrich is designing a machine for the 1980s, and utterly failing
to address the concerns of anyone actually designing or building a machine today. And that concern
is energy.

To drive energy costs down low enough to hit the 20-25MW power budget for exascale, we need to
maximise the number of compute cores (at the expense of their complexity, and clock speed),
localise memory, and cut out extraneous operations. The significance of the energy budget is
that with today's cutting edge Intel hardware, you would need order $100 billion pa to pay the
electricity bills.

Most designers use stock Intel chips, because there is generally thought to be insufficient
market for supercomputer chip-development to be cost-effective. Because ARM is both
energy-efficient, and a virtual foundry (and Steve is the original ARM designer), we have the
option to build a custom chip, unlike many of our competitors. ARM, by the way, is outselling
Intel to the extent that last year it sold over 15 billion (licensed) cores which is greater
than the entirety of all other manufacturers' chip production since 1967.

Talking within the SC community it seems we are all converging on essentially the same solution:

(1) Energy efficient low-performance cores. 

(2) Co-packaged DRAM, including multiple layers of 3D stacking.

(3) Vectorized SIMD architecture.

(4) Paring down the complexity of the individual cores, and doing rarely used operations in
   software, (or perhaps adding co-processors).

The argument against co-processors is that they consume energy, and die-area, that could be
more productively used to perform common rather than unusual computations.

For example, our partner team at FZ Julich (German Supercomputer Centre) has reported that
their current supercomputer neural simulations have a frighteningly low use of the FPU.
Most of what is needed for our application area is memory and fast MPI. This is what
SpiNNaker is optimised for. It has no FP _at_all_, has local co-packaged DRAM, and a fast
packet switched network optimised for the task in hand (lots and lots of very small packets
(4 bytes), rather than a reasonable amount of 1-2K packets which is indicated by MPI).

I'll say that again: no FPU. It's not heavily used, and it's cheaper and more energy
efficient to do a software simulation of float and double, which is a reasonable architectural
compromise. Obviously the LINPACK numbers are non-competitive, but there is a growing
realisation that LINPACK may not be very indicative of future super-computing needs.

The question -- as I see it -- is not can an exact dot product be done in hardware, but can it be done cost-effectively? That is, in a world in which volume trumps everything, is there a large enough market for the hardware development costs to be amortised over likely sales? Otherwise, letting you clever people write a software emulation would seem the most cost effective way to go.

I should point out how the current ARM co-processor architecture violates everything that Ulrich
believes is desirable. The FPU is not integrated into the main ALU pathway. Instead the opcode is
trapped-out as "unrecognised", and offered to the coprocessors attached to the internal bus.
Software compatibility is maintained by raising an interrupt if the co-processor is missing, and
executing a software simulation of the operation.

So my question is: is our proposed chip P1788 compliant? Provided that we do everything in software, I cannot see why it should not be. Am I missing something?

Are you requiring me to provide on-chip co-processors to be compliant? If so, how does this differ from off chip interrupt handling for unrecognised op-codes? Are you mandating how the FPU is integrated into the ALU? If so, do you have sufficient current architectural experience to make sensible selections?

Regards,

Dave Lester

Advanced Processor Technology Group
The University of Manchester
Manchester M13 9PL

ps Baker: If I don't have current listserv permissions, please feel free to forward to the community.


-- 
Karlsruher Institut für Technologie (KIT)
Institut für Angewandte und Numerische Mathematik
D-76128 Karlsruhe, Germany
Prof. Ulrich Kulisch

Telefon: +49 721 608-42680
Fax: +49 721 608-46679
E-Mail: ulrich.kulisch@xxxxxxx
www.kit.edu
www.math.kit.edu/ianm2/~kulisch/

KIT - Universität des Landes Baden-Württemberg 
und nationales Großforschungszentrum in der 
Helmholtz-Gesellschaft