Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: Michel's comments on interchange representation



Dmitry

On 2014 Jun 19, at 12:29, Dmitry Nadezhin wrote:
> It seems to me that when we say "interchange format", then
> - some people understand it as a format of a file or of a network stream
> to  intercahnge between different machines;
> - other people understand it as a representation in memory.
> 
> They uses the same word and it causes a mess.
> 
> I just want to define "interchange format" for file exchange.
> Usually files are treated as a sequence of bytes.
> You said not to use "byte".
> Ok I reformulated this as a seqence of bits.
> 
> The file interchange format doesn't need to know all
> these memory details like alignment, endianess, position of sign bit.

You and Michel are experts in this area, and I am not. There seem to be trade-offs involved, and I think it is up to you two to decide the best trade-off and then turn it into words for the text. However for what my opinion is worth

(1) I see IF as a format for file (or stream) exchange.

(2) Michel *does* approve the "conceptual bitstring". He wrote:
> I think it was a good idea to describe the entire (2n+1)-byte object as an
> ordered bitstring, and would just add explicitly that "the bytes are to be
> in Network Byte Order".  But we must then change the description in terms
> of a triple of two 754-objects and a one-byte decoration, and point out
> that the byte order of the entire interchange-format object is prescribed
> (which goes beyond what 754 prescribes).

Let's just do that. If this *does* go beyond what 754 prescribes, that's OK with me.

The point is not, IMO, to produce a layout on a file, that can be dumped from or to a machine's memory with *zero* work, but to produce one that is (a) natural and (b) can be dumped with *trivial* work.
For (a) Network Byte Order is natural to those of us who read text from left to right and are used to reading the most significant end of a number first; this may be different for speakers of some other languages.
For (b), I count work that just involves bit-shuffling, in a fixed pattern, as trivial. If we have 4-byte numbers then decorated interval X = [x,y]_d is (x3 x2 x1 x0 y3 y2 y1 y0 d) in Network Byte Order, where x3 is the MSB of x, etc. Dumping this to a little-endian machine as the 4-byte items (x0 x1 x2 x3) and (y0 y1 y2 y3), and the byte d, is bit-shuffling of this kind, so is trivial.

Recall that for large arrays of decorated intervals, a compiler might decide it is more efficient to store an array of x's, an array of y's and array of d's, in separate areas of memory. So while it's natural for the 3 pieces of X to be contiguous on file, in a program this needn't be the case.

>> - Don't try to remove the portability difficulty, but make it (as you say below) "standard".
> I don't understand this sentence. Please explain:
I was just quoting Michel's email.

> Is it possible to define exchange format such
> that we can use it for exchange between machines with different endianess,
> alignment, or other architecture details ?
Of course.

> Will P1788 become better if we define this format in addition to portable text interchange format ?
Yes IMO.
> Do you agree that interchange format might accidently match memory layout on some machines 
Yes
> and hence make proponents of other machines unhappy ?
If they are so small-minded, yes.
> Should we invent instead an interchange format that is equivalently mismatch memory layout of all machines ?
No!!

John