While this is relevant, it is about implementation and not specification,
so I shall make this my last post. But Ivan's postings show very clearly
what I am saying - that the knowledge of the 1960s on how to do this
easily and efficiently has been all but lost, and THAT is a major cause
os the problems. Whether IEEE754-R can do anything about it is another
matter entirely.
Ivan Godard <igodard@xxxxxxxxxxx> wrote:
All of these handlers are part of the OS kernel and are intimately
dependent on the internal operations thereof.
Yes. And that is what I am saying causes the implementation problems,
which in turn cause the inefficiency - and is completely unnecessary!
If however you want (as
you appear to do) a generalized user-level emulation capability, where
arbitrary user code can be used in the handler, then you have set
yourself a very difficult task. This is because atomicity at the source
level does not in general correspond to atomicity at the machine level;
that's what sequence points are all about, with all the armwaving that
you complain about.
No, not at all. Twice. Firstly, it's not as hard as you make out
(and has been done many times), but I agree doing it generally can be
done only if it is properly designed into the architecture. Secondly
and more importantly, I am not talking about a fully general emulation
scheme.
For a quite realistic example, say that you have a
program doing work on doubles on hardware with 32-bit memory
instructions. Your app has a few globals that are also referenced from
your handler. Depending on the exact instruction sequence used by the
compiler, it is quite possible that your handler will find the front
half of some global double to reflect a different value version than
the back half, because the two halves are written by two different
memory instructions and your interrupt happened to come between them.
The same problem occurs in quad on a machine with 64-bit memory
instructions.
Which has nothing whatsoever to do with floating-point emulation on
a load/store/operate RISC design and preciously little even on an
x86-, 370- or VAX-style CISC. Let me explain how it could be done.
The operation is internally turned into a RISC form (i.e. load into
registers, operate, and store) - as the Pentium 4 is reported to do
and many other systems have done. This facility affects solely the
operation.
If emulation is required, some registers are stored in a save area,
the operands are put into fixed registers and the emulator is called
as a normal function with a suitable stack pointer. Its specification
is that, if the function changes anything other than the saved and result
registers, flags and workspace, the effect is unspecified. Upon return,
the result is copied into the right result register, the saved registers
restored and the flags merged.
On an out of order architecture, or anything similar, the hardware
also needs to deal with dependent registers, reordered operations and
look-ahead. But that is only what it has to do for branch misses and
interrupts anyway and, in THIS case, it can optimise that because it
knows that the emulator will return 'transparently'.
Yes, that really IS all that is needed. If you want to allow the function
to execute general code, you need to add an ability for it to get into
a standard state, but that is not needed for the simple emulations.
Scheduling and all that it entails treats the thread and any active
handler (and the save area) as an entity, so there is no problem there.
Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email: nmm1@xxxxxxxxx
Tel.: +44 1223 334761 Fax: +44 1223 334679