Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: Proposed replacement for 14.4 and C6.2 (interchange encodings) + encoding of compressed intervals



Michel,

> I had started to think in terms of a plain character string, but here
> are several reasons why I changed to the mixed text/binary format:
> 
> (0)  In the above, 'p1788' is inappropriate; the 'p' (for Project) will
>     go away if and when this actually becomes an IEEE standard.

This is my bug.

> (1)  I would have had to invent short names for the byte ordering.
>     Dima's "MSB" is incomplete, it should be "MSBF" for "Most
>     Significant Byte First", but may be better than "BE" or "LE".

Ok.

> (2)  I wanted a fixed-length signature, preferably a multiple of 4 bytes.
>     That fits more easily into structure layouts in many languages.

If we assume that a range of floating-point format size is limited then the length
is also limited. If format size is limited by 2^24 octets or 2^27 bits,
then its size is at most 9 decimal digits.
The length of the string ieee1788_encNNNNNNNNN_be_dMM is at most 28 ascii chars.
If the length of the signature is 31 ascii chars plus terminated \0,
then it can encode formats up to 2^39 bits (2^36 bytes).  

> (3)  I want to accomodate ALL 754-2008 formats, including the extendable
>     ones that can have arbitrary size (in multiples of 32 bits).  Such
>     sizes are better represented in binary than in a decimal character
>     string.

The parsing of signature doesn't hit performance because it occurs once pur
data array. For readablility, humans already use decimal sizes: bin 64, bin128 ... .

> (4)  The method of detecting Endianness by looking at a binary field of
>     known content is quite general.  If in fact the application already
>     knows the size (which is likely), this could even detect some messy
>     mixed-endian formats, although there are values where all four bytes
>     would be the same and hence useless as an indicator.  (That's why I
>     mentioned 2**24 as a practical upper limit.)

I understand, but I would like that the signature is more human-oriented.

Also I think now about encoding of compressed intervals and their possible signature.
The compressed data type T_tau associated with bare data type T could have a signature
with a name of decoration tau instead of dMM suffix.
For example
ieee1788_bin64_be      - is a signature of plain mapping of bare interval infsup_bin64
ieee1788_bin64_be_d8   - is a signature of plain mapping of decorated interval infsup_bin64
ieee1788_bin64_be_def  - is a signature of plain mapping of compressed interval infsup_bin64_def

The encoding of compressed intervals may be defined as follow.
At level 3 it is a pair. Its components are (inf(x),sup(x)) when x is a bare interval
and (NaN,NaN) when x is a decoration. Different decorations are indistinguishable at
Level 3 representation of interval because its componennts are Level 2 floating point datume.

When we refine fields to Level 3 floating point representation
we choose signaling NaNs (sNaN,sNaN). Different decorations are still indistinguishable.

When we refine interval to Level 4 encoding and fields to level 4 encoding,
we choose signaling NaNs with positive sign, the first 8 bits of the payload is the encoding of decoration
and the trailing bits are zeros.

> Btw, I agree with Vincent's later comment that we should more clearly
> state that the interchange format does not attempt to dictate memory
> layout except for the purpose of exporting the data to another medium,
> e.g. a network packet or file stream -- and that we should universally
> use "octets" instead of "bytes", as Dima has started to do.

Ok.

Michel.
---Sent: 2014-06-23 14:03:51 UTC