[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Specifying signs when f(0)=0



Ivan Godard <igodard@xxxxxxxxxxx> wrote:

Traps are in general impossible to implement in modern multi-core or 
multiprocessor parallel hardware. The whole trap notion assumes a 
sequential execution model, which is fine if you are willing to give up 
many orders of magnitude of performance; those with access to such 
hardware are not willing. The whole question is very architecture 
dependent - traps are doable on a Beowulf cluster design for example, 
but are absolutely hopeless on a grid machine.

I am sorry, but that is completely wrong.

What is impossible is to take a precise trap, pass control to user
code and then resume.

It is trivial to implement LIA-1 trap-diagnose-and-terminate, as
used to be the norm - and was the default on the Alpha.  It isn't
even hard to make such traps precise, at the hardware level, though
it is at the software level (optimisation confuses the issue).

[ One can ask why that is not even a SPECIFIED alternate exception
handling mode in IEEE 754R, but I shall refrain from commenting. ]

What most people fail to realise is that precise trapping and
resumption is actually HOW the full semantics of IEEE 754 is
implemented today, on virtually all architectures except the
POWER.  But only the kernel gets control.

In fact, it is pretty easy to implement imprecise traps, pass
control to user code and resume, but few people could handle it
when it was done and few languages have ever specified it.  But
it isn't actually hard to do - just complicated and confusing.

So far, the only semantic solution to these problems that is both 
architecturally viable on all known architectures and also solves the 
mathematical difficulties is to carry a (larger) set of status flags 
with each operand, with rules as to how the function and the argument 
values and flags produce the result values and flags. In larger and more 
powerful machines the implementation would directly represent the flags 
in a long datum; in less powerful, more sequential machines (i.e. 
desktops) the same semantics can be implemented using global flags in 
dedicated registers and create-on-need memory.

That is serious misleading, almost to the point of being false.

The former approach (which is NOT used by IEEE 754) is viable; the
latter isn't.  Not merely do global flags force a great deal of
serialisation, they are incompatible with almost all advanced
optimisations.  A nightmare problem, that I am trying to explain to
the C++ people at present, is that even purely thread-local status
flags need global synchronisation, because they can become globally
visible through their effects on other instructions.

And that is a real nightmare if you promise sequential consistency
(Lamport), because of the problem of independent reads of independent
writes.  And Microsoft do just that ....


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  nmm1@xxxxxxxxx
Tel.:  +44 1223 334761    Fax:  +44 1223 334679


754 | revision | FAQ | references | list archive