Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: Michel's comments on interchange representation



John,

It seems to me that when we say "interchange format", then
- some people understand it as a format of a file or of a network stream
to  intercahnge between different machines;
- other people unserstand it as a representation in memory.

They uses the same word and it causes a mess.

I just want to define "interchange format" for file exchange.
Usually files are treated as a sequence of bytes.
You said not to use "byte".
Ok I reformulated this as a seqence of bits.

The file interchange format doesn't need to know all
these memory details like alignment, endianess, position of sign bit.

> - Don't try to remove the portability difficulty, but make it (as you say below) "standard".

I don't understand this sentence. Please explain:

What is your opinion ? Is it possible to define exchange format such
that we can use it for exchange between machines with different endianess,
alignment, or other architecture details ?
Will P1788 become better if we define this format in addition to portable text interchange format ?

Do you agree that interchange format might accidently match memory layout on some machines 
and hence make proponents of other machines unhappy ?
Should we invent instead an interchange format that is equivalently mismath memory layout of all machines ?

  -Dima

----- Original Message -----
From: j.d.pryce@xxxxxxxxxx
To: mhack@xxxxxxx, dmitry.nadezhin@xxxxxxxxxx, rbk@xxxxxxxxxxxxx
Cc: stds-1788@xxxxxxxxxxxxxxxxx
Sent: Thursday, June 19, 2014 3:04:04 PM GMT +04:00 Abu Dhabi / Muscat
Subject: Re: Michel's comments on interchange representation

Michel
(and Dmitry, Baker)

On 2014 Jun 18, at 14:28, Michel Hack wrote:
> (John wrote:)
>> Am I missing something crucial here?
> 
> Yes!  Namely the fact that Endianness is defined for individual typed
> fields, such as int16, int32, int64, float, double, etc. -- and not for
> aggregates.  No too long ago, especially for non-IEEE floating-point
> formats, the rules were much messier than simple byte reversal, as for
> example bytes reversed in pairs within the representation.
> 
> So we CAN define the interval interchange format as suggested in C6.2,
> and apparently as originally intended, namely as an ordered triple of
> standard objects, each object being represented at Level 4 in its usual
> format for the platform.  This does not resolve portability issues due
> to differing Endianness, but it reduces them to the same problem faced
> by almost every other interchange format...

I think you and Dmitry should work this out between you and produce the revised wording. 

Baker, as it is substantive, does it need a separate motion?

> So the text of C6.2 is fine, but the example should point out explicitly
> that it shows a Big-Endian layout.  Better yet, show both layouts, and
> perhaps mention one or two current machine architectures for each.

So it seems you are suggesting
- Make the main text in §14.4 closer to Ned's wording in C6.2.
- Don't try to remove the portability difficulty, but make it (as you say below) "standard".

That sounds OK to me. Anyone object?

> This brings me to the encoding of the decorations.  The current layout
> fits well with the global-bitstring approach (which has perhaps not yet
> been ruled out, as Baker suggests we may need a separate vote), but it
> becomes awkward when we describe the triple in terms of existing formats:
> we just invented a new datatype!  In byte-oriented systems a single byte
> does not have Endianness issues -- but on word-oriented machines (which
> IEEE 754 does not rule out) it does raise a whole new set of issues,
> namely how to align the 8-bit item in a larger container.  It would be
> much better in my opinion to map decorations on "small integers", which
> is a fairly standard datatype (called "char" in C -- which may however
> have more than 8 bits).  Then the decorations would be stored in whatever
> way small integers are stored, and portability issues become standard,
> even though they don't go away.
> 
> Next question then is why were the particular values chosen?  (I know,
> it's because somebody *was* thinking about concatenating bit strings.)
>   ill    0
>   trv   32
>   def   64
>   dac   96
>   com  128
> 
> Right away, we run into an issue for some implementations:  128 is
> not "small enough" for CHAR when CHAR is considered to be signed!
> 
> Would it not have been better to use 0 through 4, as I seem to recall
> we had at one point?

Now 0 through 4 has the advantage of being simple and natural. I was rather attracted to Dmitry's "multiply by 32" (or shift 5 bits left) because of his argument "If an implementer wishes to invent new decorations in between the existing ones, this lets them do it easily".

However that is a possibility for some time in the future, and even then it is not obvious that it will save anyone much work. Whereas "small integer" exists now. And if there are indeed systems which take CHAR to be signed, Dmitry's formula gives a problem.

One could consider a compromise: say shift 3 bits left (or 2, or 4). But KISS applies. If I understand right, Michel's principle is that one should take a decorated interval as a conceptual concatenation of 3 standard datatypes, and not go any nearer bitstrings than that.

I'm OK with that approach, and in that case I favour decorations being "small integers" from 0 to 4, on KISS grounds.

Comments from others please.

John Pryce