Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Michel's comments on interchange representation



> IMO the map between this and physical storage, defined by the hardware's
> endianness and possibly by the compiler, is a "Level 5" that is irrelevant
> to our standard.  Yes, there are problems if one machine puts the bytes
> onto a serial medium in a different order from how another machine expects
> them but I don't think it's the standard's job to say how this should be
> solved.

Essentially all of today's machines are byte-addressed and the memory
is effectively a long byte string.  When that byte string is placed on
a byte-serial medium, e.g. a network packet, that order is preserved,
which is why network software has to prepare a memory image of a packet
before sending it.  For "network byte order" on Little-Endian machines,
that means that *individual packet fields* have to be stored in non-native
order, just as Big-Endian machines have to poke fields in PCI-defined
structures in an order foreign to them (PCI is Little-Endian).

> Am I missing something crucial here?

Yes!  Namely the fact that Endianness is defined for individual typed
fields, such as int16, int32, int64, float, double, etc. -- and not for
aggregates.  No too long ago, especially for non-IEEE floating-point
formats, the rules were much messier than simple byte reversal, as for
example bytes reversed in pairs within the representation.

So we CAN define the interval interchange format as suggested in C6.2,
and apparently as originally intended, namely as an ordered triple of
standard objects, each object being represented at Level 4 in its usual
format for the platform.  This does not resolve portability issues due
to differing Endianness, but it reduces them to the same problem faced
by almost every other interchange format.  I should point out that COBOL
tried to address the issue, but I think even that effort did not deal
with it completely; some record tagging issues remain (the last time I
looked, January 2012).

However, this means that we must avoid talking in terms of bit strings.
There is no need to repeat the description of the *conceptual* layout
of floating-point objects; 754-2008 does that already, and for non-IEEE
formats it would be futile (e.g. hidden bit or not, bias, radix, etc.).

So the text of C6.2 is fine, but the example should point out explicitly
that it shows a Big-Endian layout.  Better yet, show both layouts, and
perhaps mention one or two current machine architectures for each.


This brings me to the encoding of the decorations.  The current layout
fits well with the global-bitstring approach (which has perhaps not yet
been ruled out, as Baker suggests we may need a separate vote), but it
becomes awkward when we describe the triple in terms of existing formats:
we just invented a new datatype!  In byte-oriented systems a single byte
does not have Endianness issues -- but on word-oriented machines (which
IEEE 754 does not rule out) it does raise a whole new set of issues,
namely how to align the 8-bit item in a larger container.  It would be
much better in my opinion to map decorations on "small integers", which
is a fairly standard datatype (called "char" in C -- which may however
have more than 8 bits).  Then the decorations would be stored in whatever
way small integers are stored, and portability issues become standard,
even though they don't go away.

Next question then is why were the particular values chosen?  (I know,
it's because somebody *was* thinking about concatenating bit strings.)
   ill    0
   trv   32
   def   64
   dac   96
   com  128

Right away, we run into an issue for some implementations:  128 is
not "small enough" for CHAR when CHAR is considered to be signed!

Would it not have been better to use 0 through 4, as I seem to recall
we had at one point?

So you see, John, that this (what appears to be a) late change DOES
have substantial consequences, and needs more than editorial changes.
Frankly, when I first raised the issue, the decoration encoding had
been only slightly annoying, but now the "signed char" issue makes it
a serious issue.

                       *--------------*

Now, what if P1788 *does* want to go beyond 754 and prescribe truly
portable binary interchange formats?  It would restrict itself to
platforms that support arbitrary byte strings (not really an issue
today, I think -- but who knows, the HPC people may find reasons to
build bigword-based machines), and encode 754-derived intervals as
bit strings of 2n+1 bytes (i.e. 16n+8 bits), expected to be stored
consecutively if stored as an aggregate in a file.  This would then
indeed be blindly portable across all platforms that can handle byte
strings: TCP, Posix files, shared-memory, etc.


Earlier, Dima wrote something that surprised me:

> The section 14 defines two things:
>
> 1) level 3 interchange representation as a triple
>    of two number datums and a decoration;
> 2) level 4 interchange encoding as a bit string.

Now THAT was news to me, and it certainly is not apparent from the
document.  If we want two separate interchange formats, one for
sharing among like-minded systems and another for truly global
sharing, we will need two separate export and import functions,
and we need to be explicit about it!

Michel.
---Sent: 2014-06-18 14:46:49 UTC