Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: Requiring the EDP in 754 (or 1788.1 for that matter)



> On 31 Jan 2016, at 15:52, Lee Winter <lee.j.i.winter@xxxxxxxxx> wrote:
> 
> On Sun, Jan 31, 2016 at 9:50 AM, David Lester <dlester@xxxxxxxxxxxx> wrote:
> 
>> On 30 Jan 2016, at 23:03, Ulrich Kulisch <ulrich.kulisch@xxxxxxx> wrote:
>> 
>> But the real point is this:  A correctly rounded dot product is slower by at
>> least one magnitude
>> than a possibly wrong computation of the dot product in conventional
>> floating-pooint arithmetic.
>> An exact dot product would be 6 times faster than the latter. So
>> the EDP is at least 60 times faster than a possibly wrong correctly rounded
>> dot product.
>> Speed and accuracy are essential for acceptance and success of interval
>> arithmetic.
>> 
> 
> My comments are interspersed below, mostly respectfully.
> 
>> 
>> No Ulrich, it is _you_ who is refusing to pay attention.
>> 
>> (1) The world of chip design has moved on from 1985. Then, it was a
>>      handful of specialists like Seymour Cray, now everyone can do it.
> 
> That is not a change.  Anyone could do it in 1964.  The fact that few
> did is not relevant to your premise.
> 
>> (2) We have hit the “Wall” in mid-1990s. You will never get processors
>>      with clock speeds of more than a few GHz at economic prices, unless
>>      there is a new non-silicon technology invented.
> 
> Yes.
> 
>> There’s nothing on the horizon yet.
> 
> No.  In this domain your ignorance is showing.  In fact there are
> several.  HP's committment to memristors is very encouraging.  See
> “https://en.wikipedia.org/wiki/Memristor";.

Lee, I know all about memristors, thank you.
I am nowadays, after all, a neuromorphic chip
designer. The obvious first commercial use is
in digital memory circuits, so let’s see how
that works out before we look at the more
general case, eh?

Like graphene, it might work out in the end, but
for the next ten years or so, it’s probably
going to be silicon CMOS.


>> (3) Thus the real interest in chip design lies not in the uni-processor,
>>      but many-core processors (hundreds of CPU’s on each one square
>>      centimetre die).
> 
> No.  Collectively we are already recycling (as scrap) GPUs with a
> thousand or more processors.  One hundred CISC processors per CPU chip
> is an uninteresting goal in that regard.  1e5 FPU processors per chip
> starts to get interesting.  Especially if those processors are not for
> typical (partial) implementations but for rigorous (complete)
> implementations.

The goal of hundreds is what we have on the drawing board at the
moment, at 28nm. Were we working with the right foundry, we might
have been looking at 14nm FIN-FET, but that was not to be.

However, we are agreeing that — looking forward — we will see
lots more computing elements per chip. As you say thousands is
not unreasonable.

So parallelism is an important consideration, no?

>> (4) What I and David Baincolin from Berkeley have been pointing out
>>     is that the guts of what you wish to achieve can be achieved more
>>     economically using tweaks to stock chip designs, which do not
>>     have the bad side-effects your almost religious belief in a mandated
>>     hardware solution have.
> 
> I must have missed something.  Other than space and bandwidth for
> context switching (c.f. barrel register files that only require a base
> pointer reference to switch), what are the bad side effects of a
> mandated hardware solution?

Ulrich has been arguing for a non-interruptible EDP for vectors of
the order of millions of elements.
> 
> I suspect you might state the same kind of objection to modern CISC
> GDT and IDT tables, which are not only large, but variable length.
> 
>> (5) In particular, if we posit the implementation of 1024-bit add/sub
>>      instructions working on the upper and lower half of the ARM
>>      NEON SIMD unit registers, then far from a factor of 60 you
>>      mention, we’d be in the territory of about factor two or four,
>>      asymptotically.
> 
> The above statement needs a qualifier.  What are the aysmptote
> sampling points?  100? 100,000? 4e9? 1e100?

Ignoring the need for context-switching, if you give me add/sub
on a pair of 1024-bit accumulators, then I can give you EDP as
Ulrich wants it within a factor two or four of the method he
proposes.

>>      However, this unit is an optional extra for high-end processors
>>      (CORTEX-A) and is not available on the billions of cores shipped
>>      each year which are usually CORTEX-M0, or CORTEX-M4.
> 
> Too bad for them.  Your observation is a backward-looking datum, which
> should not be used to constrain the future.

Of course. But the take home message is that the usual case,
the one of economic interest, is the small systems, and that for
IEEE-1788 at least (and this is where this discussion began),
mobile devices should not be precluded from running the standard.

>> (6) You appear to be working on the assumption that by mandating
>>      “hardware” EDP you will be expediting the appearance of said
>>     processors. This is not so. What will get them made is a good
>>     economic case.
> 
> Wrong.  C.f., MC68000 vs i8086.
> 
> Do you know why IBM choose the i8086 at a time when the MC68000 was
> winning over 70% of CPU design ins (not just over intel, but over all
> competing CPUs)?
> 
> Because the weaknesses of the intel offering made it possible for IBM
> to purchase a substantial share of Intel's equity.  That ownership
> position gave IBM a lot of information and control.  THAT was the
> compelling "economic" case.
> 
> See also Mitre's idiocy in the selection of the fundamental
> communications architecture for JTIDS.  I claim that it is _never_
> useful to get the wrong answer quickly.  Much less the wrong answer
> slowly.
> 
> Please, let us not help that happen again.
> 
> Personally, I have _way_ too much experience wrestling with pre-'754
> numerical systems.  And I have no interest in a system that is not as
> rigorous.
> 
> IEEE '754-1985 was an unbelieveably huge improvement.  Counting those
> blessings takes a while.  But was it perfect?  Certainly not.  It has
> terminology issues and unspecifed behavoir issues that are _still_
> extant today.  I'm not hlding my breathe.
> 
> However, there is simply no excuse for repeating those kinds of
> defects and omissions.  "Economic cases" be damed.  I want the "right
> answer” no matter what it takes.

We have no argument, here.

I too would prefer that ARM used the standard, and take every opportunity
to raise the subject with them. However, if I have understood Ulrich’s
objective correctly, it is that by mandating EDP in a standard, this
will lead to it being implemented in a widespread fashion.

I beg to differ, and offer the above as a counter-example.

>>      As you know (because we have discussed this before), ARM’s
>>      FPU is _not_ fully 754-R compliant. This is because their
>>      typical customer is not bothered by fully standards-compliant
>>      hardware.
> 
> OK, then we need a fuzzy "standard" for the "typical" customer and a
> rigorous standard for people who actually know something about the
> problem domain.
> 
> Do you approve of spreadsheets hiding rounding from users?  If so,
> please read more of Prof. Kahan's notes on such self-injuring
> policies.
> 
> Do you use MICROS~1's spreadsheet?  Do you know why open source
> spreadsheets offer two indepenent collections of functions?  One set
> helps people who's interest is dominated by getting the right answer.
> The second set help's people who's interest is dominated by getting a
> compatible, but wrong, answer.
> 
> Personally, I am not interested in the second set.
> 
>> 
>> (7) You have never specified what you mean by “hardware-implementation”.
>>      Is it permissible to soft-trap the EDP-instruction? Are co-processors
>>      permissible. Are there — as I suspect — execution time constraints?
>>      Your rough calculations a few days ago, suggest you harbour the
>>      naive belief that each instruction is executed in a single
>> clock-cycle.
> 
> A valid point.
> 
>> 
>> (8) Finally you might wish to comment on the national security implications
>>     of your EDP proposal. As mentioned in (5) above you are perilously
>>     close to proposing extremely fast integer operations, especially if
>>     you want to combine two EDP calculations done in parallel (see (3)
>>     above). A chip with these features may well suffer from export
>>     licence restrictions — if not now, then in the near future. This
>>     surely has implications on the number of devices which could be sold..
> 
> That issue is irrelevant.  And it is mischaracterized.  There are no
> “national security implications" for hardware EDP.

So far.

>  I happen to know
> the ITAR resrtictions pretty well.

It is not clear that I’d be working under those particular
constraints (check my email address).

>  As soon as a single hardware EDP
> design has been published in book or simiar form, ITAR cannot be
> applied to that technology.

We are having trouble with UK and German export control for
neuromophic chips. There the designs _have_ all been published
but the hardware is non-exportable.

> For verification of the above claim talk to any experienced cryptographer.
> 
>> Ignoring these facts is a terrible service to the standard.
> 
> Absolutely agreed. (But I am not sure that I attributed the above
> quote correctly).

Dave

> 
> Lee Winter
> Nashua, New Hampshire
> United States of America