Thread Links			Date Links
Thread Prev	Thread Next	Thread Index	Date Prev	Date Next	Date Index

Re: ... replacement for 14.4 and C6.2 (interchange encodings)

To: <j.d.pryce@xxxxxxxxxx>
Subject: Re: ... replacement for 14.4 and C6.2 (interchange encodings)
From: Dmitry Nadezhin <dmitry.nadezhin@xxxxxxxxxx>
Date: Thu, 26 Jun 2014 02:14:28 -0700 (PDT)
Cc: <mhack@xxxxxxx>, <stds-1788@xxxxxxxxxxxxxxxxx>
Delivered-to: mhonarc@xxxxxxxxxxxxxxxx
List-help: <https://listserv.ieee.org/cgi-bin/wa?LIST=STDS-1788>, <mailto:LISTSERV@LISTSERV.IEEE.ORG?body=INFO%20STDS-1788>
List-owner: <mailto:STDS-1788-request@LISTSERV.IEEE.ORG>
List-subscribe: <mailto:STDS-1788-subscribe-request@LISTSERV.IEEE.ORG>
List-unsubscribe: <mailto:STDS-1788-unsubscribe-request@LISTSERV.IEEE.ORG>
Sender: stds-1788@xxxxxxxx

John, Michel, Ned,

I tried to merge Michel's text with John's and mine changes.
The text is in -r372, thought you might want to change it more.

My changes are:
1) The word "octets-ecoding";
2) Decoration is a bit string;
3) "Octets-encoding" might insert padding zero octets for alignment;
4) Width of the floating-point interchange format is also a parameter;
5) Example contains:
  - one Level 3 representation;
  - one Level 4 encoding;
  - two octets-encodings of fields and of interval;
6) Basic standard says about octets-encoding and omits Level 4 encodig for simplicity.

> Dmitry, you don't make it clear whether your "We now consider media as a stream of octets..."
> is intended as a first attempt at text to be included in the standard.

No. It's one of motivation to define decoration encoding as octet, not as integer. 

> I don't know if you have an English spell-checker
I haven't setup it. Perhaps next time. Sorry for this.

  -Dima

----- Original Message -----
From: j.d.pryce@xxxxxxxxxx
To: mhack@xxxxxxx
Cc: stds-1788@xxxxxxxxxxxxxxxxx
Sent: Thursday, June 26, 2014 12:46:09 PM GMT +04:00 Abu Dhabi / Muscat
Subject: Re: ... replacement for 14.4 and C6.2 (interchange encodings)

Michel, Dmitry

On 2014 Jun 26, at 04:05, Michel Hack wrote:
> I decided to dial back;...
Not a British idiom! I assume it means something like "reduce the volume" rather than "return your phone call".

> the specification was getting way too complicated.
Yes, and therefore I like 
"The standard does not define how this information is to be conveyed".

Dmitry, you don't make it clear whether your "We now consider media as a stream of octets..." is intended as a first attempt at text to be included in the standard. No, IMO; we haven't time to verify its details. 

A proposed scheme of this kind should go on the 1788 web site, as help for implementers.

A few suggestions on Michel's text.
(1) "This encoding permits future refinement without disturbing the natural
  propagation order of the decorations"
  Delete "natural"; also in C6.2.

(2) "out of scope for 754-2008" -> "not specified 754-2008"

(3) "There is also the fact that ..." doesn't sound very standardese. How about
  "The two distinct encodings of decimal formats in 754-2008, called BID and DPD, need to be distinguished."

Why not delete the whole sentence since, below,
"* For decimal formats, whether BID or DPD encoding is used."
says enough.

(4) "This standard takes the more lenient view ..." also not very standardese. Try
"This standard requires the parameters of the mapping to a stream of octets to be communicated ..."
Maybe add "merely" before "requires".

How standardese is "like-minded"? Try "similar"?

Dmitry, I look forward to your revised text. BTW I don't know if you have an English spell-checker; if so, it could save us a round of revision, valuable when time is short.

John

Attachment: octets2.pdf
Description: Adobe PDF document

Index: P1788G_level3_B.tex
===================================================================
--- P1788G_level3_B.tex	(revision 371)
+++ P1788G_level3_B.tex	(working copy)
@@ -65,34 +65,47 @@
 For example, the only representative of $\Emp_{\Dtrv}$ is the triple $(\pinf,\ninf,\Dtrv)$, and
 the only representative of \nai is the triple $(\nan,\nan,\Dill)$.
 
-
-
-Interchange Level 4 encoding of an interval datum is a bit string that comprises the interchange encodings of the fields of the ordered pair/triple above (in the  order given).
-The \@b64@ datums are encoded as in the IEEE 754 interchange format: a sign bit,
-followed by 11 exponent bits that describe the exponent offset by a bias, and 52 bits that describe the significand 
-(the least significant bit is last).
-%�the first bit is the sign bit and the last bit is the least significant bit (LSB) of the significand. %The choice of interchange format and the choice between densely packed decimal (DPD) or binary integer decimal (BID) significand encoding for decimal 754 interchange formats (754 \S 3.5) are parameters of interchange encoding.
-The decoration is encoded in an 8-bit string as specified below: 
+Interchange octets-encoding of an interval datum is a sequence of octets (8-bit bytes)
+that comprises, in the order defined above, the interchange
+octets-encodings of the two \@b64@ datums, and, for decorated intervals, of the decoration represented
+as an octet, as follows:
+%as a small integer, as follows:
 \bc
-\begin{tabular}{c|c}
-   Decoration     & Encoding \\\hline
-   \Dill          & 00000000 \\
-   \Dtrv          & 00100000 \\
-   \Ddef          & 01000000 \\
-   \Ddac          & 01100000 \\
-   \Dcom          & 10000000 \\
+\begin{tabular}{cc}
+   \Dill &  00000000 \\
+   \Dtrv &  00000100 \\
+   \Ddef &  00001000 \\
+   \Ddac &  00001100 \\
+   \Dcom &  00010000
 \end{tabular}
-%\begin{tabular}{l|c|c|c|c|c}
-%   Decoration     & \Dill & \Dtrv & \Ddef & \Ddac & \Dcom \\\hline
-%   Representation &     0 &     1 &     2 &     3 &     4 \\
+%\begin{tabular}{cc}
+%   \Dill &   0 \\
+%   \Dtrv &   4 \\
+%   \Ddef &   8 \\
+%   \Ddac &  12 \\
+%   \Dcom &  16
 %\end{tabular}
 \ec
+This encoding permits future refinement without disturbing the
+propagation order of the decorations.
 
+The octets-encoding of \@b64@ datums is eight octets obtained from 64 bits of the IEEE 754-2008 interchange format: a sign bit, 
+followed by 11 exponent bits that describe the exponent offset by a bias, and 52 bits that describe the significand 
+(the least significant bit is last). The order of octets differs in the Big-Endian octets-encoding (the first octet contains sign-bit and 7 MSB exponents bits)
+and in the Little-Endian octets-encondigs (the first octets contains 8 LSB significand bits).
+
+%Interchange Level 4 encoding of an interval datum is a bit string that comprises the interchange encodings of the fields of the ordered pair/triple above (in the  order given).
+%The \@b64@ datums are encoded as in the IEEE 754 interchange format: a sign bit,
+%followed by 11 exponent bits that describe the exponent offset by a bias, and 52 bits that describe the significand 
+%(the least significant bit is last).
+%%�the first bit is the sign bit and the last bit is the least significant bit (LSB) of the significand. %The choice of interchange format and the choice between densely packed decimal (DPD) or binary integer decimal (BID) significand encoding for decimal 754 interchange formats (754 \S 3.5) are parameters of interchange encoding.
+
 \medskip
 
 \example{
-The interchange encoding of  $[-1,3]_\Dcom$ 
-are the concatenated bit strings below (without the spaces) 
+The Big-Endian interchange octets-encoding of  $[-1,3]_\Dcom$ 
+are the concatenated octet sequences below 
+%are the concatenated bit strings below (without the spaces) 
 
 \begin{tabular}{rl}
 $-1$&   10111111 11110000 00000000 00000000
@@ -100,9 +113,21 @@
 $3$ &  01000000 00001000 00000000 00000000
 00000000 00000000 00000000 00000000
 \\
-   \Dcom &          10000000 
+   \Dcom &          00010000 
 \end{tabular}
 
+The Little-Endian interchange octets-encoding of  $[-1,3]_\Dcom$ 
+are the concatenated octet sequences below 
+%are the concatenated bit strings below (without the spaces) 
+
+\begin{tabular}{rl}
+$-1$&   00000000 00000000
+00000000 00000000 00000000 00000000 11110000 10111111\\
+$3$ &  00000000 00000000
+00000000 00000000 00000000 00000000 00001000 01000000
+\\
+   \Dcom &          00010000 
+\end{tabular}
  
 }
 
Index: P1788G_level3.tex
===================================================================
--- P1788G_level3.tex	(revision 371)
+++ P1788G_level3.tex	(working copy)
@@ -116,99 +116,125 @@
 For example, the only representative of $\Emp_{\Dtrv}$ is the triple $(\pinf,\ninf,\Dtrv)$ and the only representative of \nai is the triple $(\nan,\nan,\Dill)$.
 
 %-----
-Interchange Level 4 encoding of an interval datum is a bit string that
-comprises, in the order defined above, the platform's representation
-of the 754-2008 interchange encodings of the two floating-datums, and
-of the decoration represented as a small integer, as detailed below.
-Since the 754-2008 standard supports two different encodings for
-decimal formats, and does not address the Endianness issue at all,
-this standard defines a {\bf standard type signature}, described below,
-which incorporates the parameters of the encoding.
-
-Interchange encoding of the decoration field is as a small integer:
+Interchange Level 4 encoding of an interval datum is a bit string
+that comprises, in the order defined above, the 754-2008 interchange
+encodings of the two floating-datums, and, for decorated intervals, of the decoration represented
+as a bit string of length 8, as follows:
+%as a small integer, as follows:
 \bc
 \begin{tabular}{cc}
-   \Dill &   0 \\
-   \Dtrv &   4 \\
-   \Ddef &   8 \\
-   \Ddac &  12 \\
-   \Dcom &  16
+   \Dill &  00000000 \\
+   \Dtrv &  00000100 \\
+   \Ddef &  00001000 \\
+   \Ddac &  00001100 \\
+   \Dcom &  00010000
 \end{tabular}
+%\begin{tabular}{cc}
+%   \Dill &   0 \\
+%   \Dtrv &   4 \\
+%   \Ddef &   8 \\
+%   \Ddac &  12 \\
+%   \Dcom &  16
+%\end{tabular}
 \ec
-This encoding permits future refinement while preserving the
-propagation order of the decorations, and fits within the range of a C
-signed char, namely $0..127$.
+This encoding permits future refinement without disturbing the
+propagation order of the decorations.
+% , and fits within the range of a C signed char, namely $0..127$.
 
-A 754-conforming implementation shall provide, for each interchange
-encoding, a standard type signature, which is a 16-byte string laid
-out as follows:
-\bc
-\begin{tabular}{cl}
-   \@ieee1788@                & 8 bytes, ASCII, any case \\
- \@bin@ $|$ \@bid@ $|$ \@dpd@ & 3 bytes, ASCII, defines FP encoding \\
-   \verb'\0'                  & 1 byte, null char (8 zero bits) \\
-   \it nnnn                   & 4 bytes, size in bytes, stored as native int32
-\end{tabular}
-\ec
-The size is that of the interchange object, consisting of the two
-floating-point datums and the decoration.  From this the size of
-an individual floating-point datum and of the decoration can be
-derived (assuming only that the former is a multiple of the latter,
-and that the size of the latter is a power of two), as well as the
-Endianness of the representation (assuming that the size is less
-than $2^{24}$, which is presumably large enough to cover any extended
-format being contemplated in the near future).
+Export and import of interchange formats normally occurs as a stream
+of octets (8-bit bytes), e.g. in a file or a network packet.  There
+is therefore a need to define the mapping of the conceptual Level 4
+bit string encodings of floating-point datums (as specified by 754-2008)
+and of decorations (not specified in 754-2008) into a sequence of
+octets (octets-encoding).
+Octets-encoding might optionally insert padding zero octets for alignment.
+% bit strings (as specified by 754-2008) and of the small integers used
+% to encode decorations (out of scope for 754-2008) into a sequence of
+% octets.
+%The two distinct encodings of decimal formats in 754-2008, called BID and DPD, need to be distinguished.
 
-The flexibility in the size of the decoration is to accommodate
-alignment and padding issues -- or systems whose "small integer"
-datatype is a word and not a byte.  (It still assumes that a word
-consists of 2, 4 or 8 bytes, which seems general enough today.)
+Applications exchanging data need to describe the types and layout
+thereof; standards like this one (and 754-2008) only define the
+representation of individual datums, and then only as conceptual
+Level 4 entities.  Environments whose primary focus is universal
+portability (e.g. Java) may fully define the representation at the
+level of a sequence of octets, even when this does not match the
+natural in-memory layout of the platform.
+This standard merely requires the parameters of the octets-encoding
+%This standard takes the more lenient view that the parameters of the mapping to a stream of octets
+be communicated, which may avoid conversion costs when bulk
+data are exchanged among similar systems.  Those parameters
+are\footnote{Those parameters describe the exported data, not necessarily
+what the platform's native representation is.  In other words,
+the possibility of universally portable representations, e.g.
+as used by Java, is included.}:
+\begin{itemize}
+\item Width in bits of the floating-point interchange format (see 754 \S3.6).
+\item For decimal formats, whether BID or DPD encoding is used.
+\item Byte order (Endianness) of the floating-point datums:
+Big Endian, Little Endian, or mixed (in which case
+it might be useful to export a template).
+\item Number of optional padding zero octets inserted before or after the decoration
+(for alignment reasons).
+% Size (in octets) and possibly byte order of the decoration
+% (if not a single octet, e.g. for padding reasons, or
+% because the platform's small integer type is bigger)
+\end{itemize}
 
-For example, in C, the corresponding structure describing the type
-used in the Basic Standard (\apref{setbasedbsia}) would be:
-\begin{lstlisting}
-   struct { char    format[12];
-            int32_t itemlength; } xxx = { "ieee1788bin", 17 };
-\end{lstlisting}
-(If $l_d=2^d$ is the size of the decoration, and $l_f=k\,l_d$ is the size
-of a floating-point datum, then $\@itemlength@ = (2k+1)\,2^d$, from which
-both $k$ and $d$ can be recovered, and hence also $l_d$ and $l_f$.)
+The standard does not define how this information is to be conveyed.
+This is not a limitation because applications must already agree on
+what it is that is being exchanged.
 
-Such a type signature could be included in a header that accompanies
-interchange-encoded intervals for export, to achieve fairly universal
-portability.
-
 \example{
 The interchange representation of inf-sup \@binary32@ decorated interval $[-1,3]_\Dcom$ is a triple
 
 (-0x1.000000p0, +0x1.800000p1, \Dcom).
 
-The Big-Endian interchange encodings of its fields are these bit strings (with underscores for readability):
+The Level 4 interchange encodings of its fields are these bit strings (with underscores for readability):
 \[
 \begin{tabular}{l|l}
   --0x1.000000p0  & 10111111\_10000000\_00000000\_00000000 ,\\
    +0x1.800000p1  & 01000000\_01000000\_00000000\_00000000 ,\\
-   \Dcom          & 10000000 .
+   \Dcom          & 00010000 .
 \end{tabular}
 \]
 
-The Big-Endian interchange encoding of the interval is a bit string of length 72:
-\bc 10111111\_10000000\_00000000\_00000000\_01000000\_01000000\_00000000\_00000000\_10000000 . \ec
+The Level 4 interchange encoding of the interval is a bit string of length 72:
+\bc 10111111\_10000000\_00000000\_00000000\_01000000\_01000000\_00000000\_00000000\_00010000 . \ec
 
-The Little-Endian interchange encodings of its fields are these bit strings:
+The Big-Endian interchange octets-encodings of its fields are these sequences of octets:
 \[
-\begin{tabular}{l|l}
-  --0x1.000000p0  & 00000000\_00000000\_10000000\_10111111 ,\\
-   +0x1.800000p1  & 00000000\_00000000\_01000000\_01000000 ,\\
-   \Dcom          & 10000000 .
+\begin{tabular}{l|llll}
+  --0x1.000000p0  & 10111111 & 10000000 & 00000000 & 00000000\\
+   +0x1.800000p1  & 01000000 & 01000000 & 00000000 & 00000000\\
+   \Dcom          & 00010000
 \end{tabular}
 \]
 
-The Little-Endian interchange encoding of the interval is a bit string of length 72:
-\bc 00000000\_00000000\_10000000\_10111111\_00000000\_00000000\_01000000\_01000000\_10000000 . \ec
+The Big-Endian interchange octets-encoding of the interval is a sequence of 9 octets:
+\[
+\begin{tabular}{lllllllll}
+10111111 & 10000000 & 00000000 & 00000000 & 01000000 & 01000000 & 00000000 & 00000000 & 00010000 . \\
+\end{tabular}
+\]
+
+The Little-Endian interchange octets-encodings of its fields are sequences of octets:
+\[
+\begin{tabular}{l|llll}
+  --0x1.000000p0  & 00000000 & 00000000 & 10000000 & 10111111\\
+   +0x1.800000p1  & 00000000 & 00000000 & 01000000 & 01000000\\
+   \Dcom          & 00010000
+\end{tabular}
+\]
+
+The Lettle-Endian interchange octets-encoding of the interval is a sequence of 9 octets:
+\[
+\begin{tabular}{lllllllll}
+00000000 & 00000000 & 10000000 & 10111111 & 00000000 & 00000000 & 01000000 & 01000000 & 00010000 . \\
+\end{tabular}
+\]
 }
 
-
 \note{The above rules imply that an interval has a unique interchange
 representation if it is not \nai and in a binary format, but not
 generally otherwise. The reason for the rules is that the sign of a

Follow-Ups:
- Re: ... replacement for 14.4 and C6.2 (interchange encodings)
  - From: John Pryce

Prev by Date: Re: ... replacement for 14.4 and C6.2 (interchange encodings)
Next by Date: Re: ... replacement for 14.4 and C6.2 (interchange encodings)
Previous by thread: Re: ... replacement for 14.4 and C6.2 (interchange encodings)
Next by thread: Re: ... replacement for 14.4 and C6.2 (interchange encodings)
Index(es):
- Date
- Thread