-----Original Message-----
From: stds-754@xxxxxxxx [mailto:stds-754@xxxxxxxx]On Behalf Of Nick
Maclaren
Sent: Sunday, May 27, 2007 11:55 AM
To: stds-754@xxxxxxxxxxxxxxxxx
Subject: Re: Specifying signs when f(0)=0
Ivan Godard <igodard@xxxxxxxxxxx> wrote:
Traps are in general impossible to implement in modern multi-core or
multiprocessor parallel hardware. The whole trap notion assumes a
sequential execution model, which is fine if you are willing
to give up
many orders of magnitude of performance; those with access to such
hardware are not willing. The whole question is very architecture
dependent - traps are doable on a Beowulf cluster design for example,
but are absolutely hopeless on a grid machine.
I am sorry, but that is completely wrong.
What is impossible is to take a precise trap, pass control to user
code and then resume.
It is trivial to implement LIA-1 trap-diagnose-and-terminate, as
used to be the norm - and was the default on the Alpha. It isn't
even hard to make such traps precise, at the hardware level, though
it is at the software level (optimisation confuses the issue).
[ One can ask why that is not even a SPECIFIED alternate exception
handling mode in IEEE 754R, but I shall refrain from commenting. ]
What most people fail to realise is that precise trapping and
resumption is actually HOW the full semantics of IEEE 754 is
implemented today, on virtually all architectures except the
POWER. But only the kernel gets control.
In fact, it is pretty easy to implement imprecise traps, pass
control to user code and resume, but few people could handle it
when it was done and few languages have ever specified it. But
it isn't actually hard to do - just complicated and confusing.
So far, the only semantic solution to these problems that is both
architecturally viable on all known architectures and also solves the
mathematical difficulties is to carry a (larger) set of status flags
with each operand, with rules as to how the function and the argument
values and flags produce the result values and flags. In
larger and more
powerful machines the implementation would directly represent
the flags
in a long datum; in less powerful, more sequential machines (i.e.
desktops) the same semantics can be implemented using global flags in
dedicated registers and create-on-need memory.
That is serious misleading, almost to the point of being false.
The former approach (which is NOT used by IEEE 754) is viable; the
latter isn't. Not merely do global flags force a great deal of
serialisation, they are incompatible with almost all advanced
optimisations. A nightmare problem, that I am trying to explain to
the C++ people at present, is that even purely thread-local status
flags need global synchronisation, because they can become globally
visible through their effects on other instructions.
And that is a real nightmare if you promise sequential consistency
(Lamport), because of the problem of independent reads of independent
writes. And Microsoft do just that ....
Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email: nmm1@xxxxxxxxx
Tel.: +44 1223 334761 Fax: +44 1223 334679