Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: possible decision on interchange representation



On Thu, 19 Jun 2014 09:28:07 -0500, Baker Kearfott wrote:
> Let's see what Dima and Michel come up with, and go from there.

I think I want input from a wide range of individuals, which is why
I'm not just talking to the small group who worked out the current
definition of interval interchange formats (sections 14.4 and C6.2).

I read the correspondence (primarily) between Dmitry Nadezhin, Ned
Nedialkov, John Pryce, and Guillaume Melquiond, and see the tug of
war between "Level 3" and "Level 4" concepts, and the insidious
Endianness issue.  At some point even bit numbering entered the
discussion -- without apparently realising that there are also two
ways of numbering bits, with Bit 0 being either Least significant
of Most significant.  (Bit numbering can even be the opposite of
byte numbering, as in the Motorola 68K family.)

I also read IEEE 754-2008 again, looking to see what (if anything)
it said about Endianness.  My recollection was that this issue was
deliberately left out -- but I thought it actually said so explicitly,
so that one would be aware of the issue.  It turns out it is totally
silent -- the string "endian" does not appear anywhere.

It is indeed possible to ignore Endianness when considering a single
datum, which would typically be found in a register, but may also be
found in memory -- and the mapping of memory to register is generally
understood to be machine-specific:  THIS is where Endianness enters
the picture.

The situation becomes more complicated when a single datum does not
sit in a register.  On can imagine that it would be fetched piecemeal
from memory, and simply assume that memory would be traversed in the
appropriate order, without going into detail.  This is how I would
interpret Clause 3.7, Extended and extendable precisions, of 754-2008.

Real trouble occurs when we consider an aggregate of multiple datums.
The only instance in 754-2008 is arrays used in reduction operations,
but those elements are treated one by one, order being irrelevant.

Now comes P1788, which wants to define an interchange format for an
ordered aggregate of three items:  two number-format datums and one
decoration datum.  The number-format datums are expected to have
their own interchange formats, e.g. as specified by 754-2008, and I
suggested that we peruse the fairly-standard concept of "small integer"
to encode decorations in a standard manner, taking into account the
smallest range of existing small-integer types, which is 0..127 for
signed char in C.

My suggestion is to stay at that level.  This does mean giving up
cross-system bit-level portability; indeed, it does not even require
the decoration to be stored in one byte (some systems might well not
support byte granularity -- they will store their small integers in
bigger containers, according to their own well-defined rules).  The
plus side is that it ALSO means that cross-system portability issues
are the same as for nearly everything else.  Yes, I know that Java
went beyond this and did specify bit-level representations for all
of its datatypes, because (certainly initially) global portability
was its raison d'être.  I suspect however that P1788 would like a
wider range of applicability.

So what can we do about this?

We must avoid talking in terms of concatenated bit strings, because
that only makes sense when we do indeed insist on global bit-level
portability.  I would not object to a recommended truly portable
representation -- but note that this would get us into political
issues beyond Endianness, because we would have to take sides in
the BID vs DPD encoding of 754-2008 Decimal formats!  I note that
current 14.4 says that the choice between BID and DPD would be a
"parameter of interchange encoding".  But if we acknowledge such
parameters, why can't the Endianness used by the platform also be
such a parameter?  Would that not solve this dilemma?

I also note that 14.4 restricts itself to 754-conforming implementations.

So I think the solution is to leave "Level 4" issues *entirely* to the
IEEE 754-2008 standard, with the exception of decorations (because the
754 standard explicitly excludes specification of integer formats).
We can give bit-level examples for both big-endian and little-endian
machines, but we should refer the reader to 754-2008 for details, and
not try a partial explanation.  (The current bitstring exposition is
indeed incomplete since it doesn't mention the hidden unit bit, though
it is technically correct since it only "describes" the significand.)

For decorations, I suggest the following small-integer encoding:
   ill    0
   trv    4
   def    8
   dac   12
   com   16
with a remark "This encoding permits future refinement without
disturbing the natural propagation order of the decorations,
and fits within the range of a C signed CHAR, namely 0..127."

I will now try to come up with actual text replacement for sections 14.4
and C6.2 -- the latter also needs to be cleansed of bit-string mentions,
notwithstanding my first impressions.  Meanwhile I'll be monitoring your
comments.  Please don't be silent; we don't have much time.

Michel.
---Sent: 2014-06-20 21:56:05 UTC