Re: Proposed replacement for 14.4 and C6.2 (interchange encodings) + encoding of compressed intervals
Michel,
> I had started to think in terms of a plain character string, but here
> are several reasons why I changed to the mixed text/binary format:
>
> (0) In the above, 'p1788' is inappropriate; the 'p' (for Project) will
> go away if and when this actually becomes an IEEE standard.
This is my bug.
> (1) I would have had to invent short names for the byte ordering.
> Dima's "MSB" is incomplete, it should be "MSBF" for "Most
> Significant Byte First", but may be better than "BE" or "LE".
Ok.
> (2) I wanted a fixed-length signature, preferably a multiple of 4 bytes.
> That fits more easily into structure layouts in many languages.
If we assume that a range of floating-point format size is limited then the length
is also limited. If format size is limited by 2^24 octets or 2^27 bits,
then its size is at most 9 decimal digits.
The length of the string ieee1788_encNNNNNNNNN_be_dMM is at most 28 ascii chars.
If the length of the signature is 31 ascii chars plus terminated \0,
then it can encode formats up to 2^39 bits (2^36 bytes).
> (3) I want to accomodate ALL 754-2008 formats, including the extendable
> ones that can have arbitrary size (in multiples of 32 bits). Such
> sizes are better represented in binary than in a decimal character
> string.
The parsing of signature doesn't hit performance because it occurs once pur
data array. For readablility, humans already use decimal sizes: bin 64, bin128 ... .
> (4) The method of detecting Endianness by looking at a binary field of
> known content is quite general. If in fact the application already
> knows the size (which is likely), this could even detect some messy
> mixed-endian formats, although there are values where all four bytes
> would be the same and hence useless as an indicator. (That's why I
> mentioned 2**24 as a practical upper limit.)
I understand, but I would like that the signature is more human-oriented.
Also I think now about encoding of compressed intervals and their possible signature.
The compressed data type T_tau associated with bare data type T could have a signature
with a name of decoration tau instead of dMM suffix.
For example
ieee1788_bin64_be - is a signature of plain mapping of bare interval infsup_bin64
ieee1788_bin64_be_d8 - is a signature of plain mapping of decorated interval infsup_bin64
ieee1788_bin64_be_def - is a signature of plain mapping of compressed interval infsup_bin64_def
The encoding of compressed intervals may be defined as follow.
At level 3 it is a pair. Its components are (inf(x),sup(x)) when x is a bare interval
and (NaN,NaN) when x is a decoration. Different decorations are indistinguishable at
Level 3 representation of interval because its componennts are Level 2 floating point datume.
When we refine fields to Level 3 floating point representation
we choose signaling NaNs (sNaN,sNaN). Different decorations are still indistinguishable.
When we refine interval to Level 4 encoding and fields to level 4 encoding,
we choose signaling NaNs with positive sign, the first 8 bits of the payload is the encoding of decoration
and the trailing bits are zeros.
> Btw, I agree with Vincent's later comment that we should more clearly
> state that the interchange format does not attempt to dictate memory
> layout except for the purpose of exporting the data to another medium,
> e.g. a network packet or file stream -- and that we should universally
> use "octets" instead of "bytes", as Dima has started to do.
Ok.
Michel.
---Sent: 2014-06-23 14:03:51 UTC