Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: ... replacement for 14.4 and C6.2 (interchange encodings)



Michel,

I like this text in general.
However, I would like if is defined in terms "octets in media" without referening
to platform or C types like char or integer.
I prefer encoding of the decoration as a single octet (8-bit bit string),
but we can say that octet stream might contain padding zero octets
at specified positions.

   Interchange Level 4 encoding of an interval datum is a bit string
   that comprises, in the order defined above, the 754-2008 interchange
   encodings of the two floating-datums, and of the decoration represented
   as an 8-bit string small integer, as follows:
      ill  00000000
      trv  00000100
      def  00001000
      dac  00001100
      com  00010000
   This encoding permits future refinement without disturbing the natural
   propagation order of the decorations.

   Export and import of interchange formats normally occurs as a stream
   of octets (8-bit bytes), e.g. in a file or a network packet.  There
   is therefore a need to define the mapping of the conceptual Level 4
   bit strings (as specified by 754-2008) and decoration bit strings
   (out of scope for 754-2008) into a sequence of octets with possible inserting
   of padding zero octets for alignment. 
   There is also the fact that 754-2008 defines two distinct
   encodings of decimal formats, called BID and DPD.


=====

We now consider media as a stream of octets and consider interleave
placements of field encodings into this stream:
inf_0, sup_0, dec_0, inf_1, sup_1, dec_1, ...

I tried to think about other view of media,
though currently I am not sure to include this in the standard.
media is conseptually an array of octets,
media[i] is the octet at place "i".

Floating point encoding is a bit string with a length N that is a multiple of 8,
so it can be viewed as a an array of octets of length N/8.
octet[0] contains sign bit and MSB bits of exponent
octet[N/8-1] contains LSB bits of mantissa.

Reading floating point encoding from media is expressed by a function octet[] = readfp(media, base, endianess).
readfp(media, base, be)[i] = media[base + i]
readfp(media, base, le)[i] = media[base + N/8 - 1 -i].
Reading decoration encoding is expressed by a function octet = readdec(media, base)
readdec(media, base) = media[base].

Binary representation is for bulky exchange. So consider an array of intervals of length L.
int[k] = (inf[k], sup[k], dec[k]).

Then reading of the array can be parameterized by seven parameters:
endianess, inf-base, inf-step, sup-base, sup-step, dec-bas, dec-step.

inf[k] = readfp(media, inf-base + k * inf-step, endianess)
sup[k] = readfp(media, sup-base + k * sup-step, endianess)
dec[k] = readfp(media, dec-base + k * dec-step)

This scheme is similary to BLAS layout, but in media instead of stream

This allows for
   infs|sups|decs layout
inf-base = array-base 
inf-step = N/8
sup-base = array-base + N*L/8
sup-step = N/8
dec-base = array-base + N*L/4
dec-step = 1

   infs-sups|decs layout

inf-base = array-base
inf-step = N/4
sup-base = array-base + N/4
sup-step = N/4
dec-base = array-base + N*L/4
dec-step = 1

  The former ieee754_encN_be_dM signature corresponds to such layout
endianess = be
inf-base = array-base
inf-step = N/4+M/8
sup-base = array-base + N/8
sup-step = N/4+M/8
dec-base = array-base + N/4 + M/8 - 1
dec-step = N/4+M/8
Media octets media[k*(N/4+M/8)+N/4 : k*(N/4+M/8)+N/4+M/8-2] don't store information.
These are padding octets. They are either garbage or zero octets.

  The former ieee754_encN_le_dM signature corresponds to such layout
endianess = le
inf-base = array-base
inf-step = N/4+M/8
sup-base = array-base + N/8
sup-step = N/4+M/8
dec-base = array-base + N/4
dec-step = N/4+M/8
Media octets media[k*(N/4+M/8)+N/4+1 : k*(N/4+M/8)+N/4+M/8-1] don't store information.
These are padding octets. They are either garbage or zero octets.

As a bunus, such parameterizing allows to store matrices in a disk media, and retrieve individual slices (as BLAS does).

=====

> We can assume
> that numeric (non-empty) intervals are represented at Level 3 like other
> bare intervals, but the encoding of Empty may be different

The representation of NaI is (NaN, NaN, ILL),
the representations of other Empty is (+Inf,-Inf, dec).
Other Emptys can still be encoded (+Inf,-Inf).
We migh change NaI to something like (NaN, ILL) or to (NaN+ILL,NaN+ILL)
where NaN+ILL is the NaN payloaded with ILL.

  -Dima

----- Original Message -----
From: mhack@xxxxxxx
To: stds-1788@xxxxxxxxxxxxxxxxx
Sent: Thursday, June 26, 2014 7:39:01 AM GMT +04:00 Abu Dhabi / Muscat
Subject: Re:... replacement for 14.4 and C6.2 (interchange encodings)

I decided to dial back; the specification was getting way too complicated.

The crucial revelation to me was the fact that applications already need
to communicate, or agree upon, what is being exchanged, and the only new
thing is the need to communicate in addition the parameters used in the
mapping from conceptual Level 4 representations to a sequence of octets
suitable for communication via file or network.

Issues such as which Level 2 types are being used, and whether decorated,
bare, or compressed intervals are being exchanged, are separate from issues
of platform dependency, and need no special support from the standard.

We may want to add that the interchange format can also be used for bare
decorations, and possibly also for compressed decorations, though in this
last case there are additional representation parameters that need to be
passed along, namely how decorations are encoded.  Indeed, Level 2 claims
to make "no further requirements" than in 11.10 (see 12.1).  Apparently
this also absolves Levels 3 and 4 of any responsibility -- but that leaves
a big gap with respect to the representation of decorations.  We can assume
that numeric (non-empty) intervals are represented at Level 3 like other
bare intervals, but the encoding of Empty may be different, and there are
several possibilities for encoding decorations, perhaps as NaN payloads,
or as small FP integers in one bound with a generic NaN in the other (to
avoid the many portability issues with NaN payloads).  It seems to me that
we need a clause 14.5 "interchange representations of optional compressed
intervals".  I'll give it a shot.  Meanwhile, I enclose below my simplified
version of the 14.4 and C6.2 rewrites.

Michel.

Enclosed: (88 lines)

The first three paragraphs of 14.4 are ok.  I start with rewriting the
fourth paragraph, which currently begins the same way:

   Interchange Level 4 encoding of an interval datum is a bit string
   that comprises, in the order defined above, the 754-2008 interchange
   encodings of the two floating-datums, and of the decoration represented
   as a small integer, as follows:
      ill    0
      trv    4
      def    8
      dac   12
      com   16
   This encoding permits future refinement without disturbing the natural
   propagation order of the decorations, and fits within the range of a C
   signed char, namely 0..127.

   Export and import of interchange formats normally occurs as a stream
   of octets (8-bit bytes), e.g. in a file or a network packet.  There
   is therefore a need to define the mapping of the conceptual Level 4
   bit strings (as specified by 754-2008) and of the small integers used
   to encode decorations (out of scope for 754-2008) into a sequence of
   octets.  There is also the fact that 754-2008 defines two distinct
   encodings of decimal formats, called BID and DPD.

   Applications exchanging data need to describe the types and layout
   thereof; standards like this one (and 754-2008) only define the
   representation of individual datums, and then only as conceptual
   Level 4 entities.  Environments whose primary focus is universal
   portability (e.g. Java) may fully define the representation at the
   level of a sequence of octets, even when this does not match the
   natural in-memory layout of the platform.  This standard takes the
   more lenient view that the parameters of the mapping to a stream of
   octets be communicated, which may avoid conversion costs when bulk
   data are exchanged among like-minded systems.  Those parameters
   are (*footnote):

   * Byte order (Endianness) of the floating-point datums:
        Big Endian, Little Endian, or mixed (in which case
        it might be useful to export a template).

   * Size (in octets) and possibly byte order of the decoration
        (if not a single octet, e.g. for padding reasons, or
        because the platform's small integer type is bigger)

   * For decimal formats, whether BID or DPD encoding is used.

   The standard does not define how this information is to be conveyed.
   This is not a limitation because applications must already agree on
   what it is that is being exchanged.

*footnote:  Those parameters describe the exported data, not necessarily
            what the platform's native representation is.  In other words,
            the possibility of universally portable representations, e.g.
            as used by Java, is included.


The original text resumes with the Example, except that there should be
two versions following the two-line Level-3 description.  In each instance
"The interchange encodings" is to be replaced with:
  The Big-Endian interchange encodings...     (example as given)
  The Little-Endian interchange encodings...  (on each line, reverse the
                                               eight-bit bytes, but not
                                               the bits within a byte)
(Also remember to recode the com decoration as 0010000 instead of 1000000)


Similarly, the first three paragraphs of C6.2 are ok, and my rewrite starts
with the fourth paragraph (taken from the 14.4 rewrite above):

   Interchange Level 4 encoding of an interval datum is a bit string
   that comprises, in the order defined above, the 754-2008 interchange
   encodings of the two floating-datums, and of the decoration represented
   as a small integer, as follows:
      ill    0
      trv    4
      def    8
      dac   12
      com   16
   This encoding permits future refinement without disturbing the natural
   propagation order of the decorations, and fits within the range of a C
   signed char, namely 0..127.

   The decoration should be encoded in a single octet, and the byte order
   of the two floating-point datums should be noted among the information
   used to describe the exported data.

The original text resumes with the Example, except that there should be
two versions, Big-Endian and Little-Endian, as decribed above for 14.4.
---Sent: 2014-06-26 03:38:23 UTC