Re: Please listen to Ulrich here...

Thread Links	Date Links
Thread Prev	Thread Next	Thread Index	Date Prev	Date Next	Date Index

Thread Links

Date Links

Dear David,

thank you for your interesting mail. I fully agree with you that most computer applications do not need floating-point arithmetic and can sufficiently be dealt with more energy efficient processors than those used in a PC today. Of course, these simpler processors also can be used for floating-point arithmetic and scientific computing including interval arithmetic. In the late 1970ies we implemented our PASCAL-XSC on the Z80 which just provided an 8 bit adder. I recommend everybody to have a look at this language! It was one of the more powerful programming languages that was available for scientific computing. But running it on the Z80 certainly was not energy efficient. So the question remains how scientific computing can be made more energy efficient.

< The problem as I see it is that Ulrich is designing a machine for the 1980s, and utterly failing to address the concerns of
< anyone actually designing or building a machine today. And that concern is energy.

My understanding of the situation is not so much influenced by the computer of the 1980ies but by those that were used 50 years earlier. The better computers used in the 1930ies and earlier provided a long accumulator. It allowed error free accumulation of numbers and of products of numbers into a wide fixed-point register. Several of these computers provided more than one such register. This method is simpler and more energy efficient than doing a dot product in the conventional way in floating-point arithmetic. No intermediate normalizations and roundings have to be performed. No intermediate results have to be stored and read in again for the next operation. No intermediate overflow and underflow can occur and has to be checked for. No error analysis is necessary. The result is always exact. The operations are just: multiply, shift and add.

With best regards
Ulrich

Am 08.08.2013 11:41, schrieb David Lester:

As a supercomputer designer http://apt.cs.man.ac.uk/projects/SpiNNaker/SpiNNchip/, with a new €1 billion funding stream as part of http://www.humanbrainproject.eu/, I had been intending to sit this one out. The problem as I see it is that Ulrich is designing a machine for the 1980s, and utterly failing to address the concerns of anyone actually designing or building a machine today. And that concern is energy. To drive energy costs down low enough to hit the 20-25MW power budget for exascale, we need to maximise the number of compute cores (at the expense of their complexity, and clock speed), localise memory, and cut out extraneous operations. The significance of the energy budget is that with today's cutting edge Intel hardware, you would need order $100 billion pa to pay the electricity bills. Most designers use stock Intel chips, because there is generally thought to be insufficient market for supercomputer chip-development to be cost-effective. Because ARM is both energy-efficient, and a virtual foundry (and Steve is the original ARM designer), we have the option to build a custom chip, unlike many of our competitors. ARM, by the way, is outselling Intel to the extent that last year it sold over 15 billion (licensed) cores which is greater than the entirety of all other manufacturers' chip production since 1967. Talking within the SC community it seems we are all converging on essentially the same solution: (1) Energy efficient low-performance cores. (2) Co-packaged DRAM, including multiple layers of 3D stacking. (3) Vectorized SIMD architecture. (4) Paring down the complexity of the individual cores, and doing rarely used operations in software, (or perhaps adding co-processors). The argument against co-processors is that they consume energy, and die-area, that could be more productively used to perform common rather than unusual computations. For example, our partner team at FZ Julich (German Supercomputer Centre) has reported that their current supercomputer neural simulations have a frighteningly low use of the FPU. Most of what is needed for our application area is memory and fast MPI. This is what SpiNNaker is optimised for. It has no FP _at_all_, has local co-packaged DRAM, and a fast packet switched network optimised for the task in hand (lots and lots of very small packets (4 bytes), rather than a reasonable amount of 1-2K packets which is indicated by MPI). I'll say that again: no FPU. It's not heavily used, and it's cheaper and more energy efficient to do a software simulation of float and double, which is a reasonable architectural compromise. Obviously the LINPACK numbers are non-competitive, but there is a growing realisation that LINPACK may not be very indicative of future super-computing needs. The question -- as I see it -- is not can an exact dot product be done in hardware, but can it be done cost-effectively? That is, in a world in which volume trumps everything, is there a large enough market for the hardware development costs to be amortised over likely sales? Otherwise, letting you clever people write a software emulation would seem the most cost effective way to go. I should point out how the current ARM co-processor architecture violates everything that Ulrich believes is desirable. The FPU is not integrated into the main ALU pathway. Instead the opcode is trapped-out as "unrecognised", and offered to the coprocessors attached to the internal bus. Software compatibility is maintained by raising an interrupt if the co-processor is missing, and executing a software simulation of the operation. So my question is: is our proposed chip P1788 compliant? Provided that we do everything in software, I cannot see why it should not be. Am I missing something? Are you requiring me to provide on-chip co-processors to be compliant? If so, how does this differ from off chip interrupt handling for unrecognised op-codes? Are you mandating how the FPU is integrated into the ALU? If so, do you have sufficient current architectural experience to make sensible selections? Regards, Dave Lester Advanced Processor Technology Group The University of Manchester Manchester M13 9PL ps Baker: If I don't have current listserv permissions, please feel free to forward to the community.

-- Karlsruher Institut für Technologie (KIT) Institut für Angewandte und Numerische Mathematik D-76128 Karlsruhe, Germany Prof. Ulrich Kulisch Telefon: +49 721 608-42680 Fax: +49 721 608-46679 E-Mail: ulrich.kulisch@xxxxxxx www.kit.edu www.math.kit.edu/ianm2/~kulisch/ KIT - Universität des Landes Baden-Württemberg und nationales Großforschungszentrum in der Helmholtz-Gesellschaft