Re: Level 3 interchange formats (M0029.01)
On Sat, Nov 19, 2011 at 3:33 PM, Michel Hack <hack@xxxxxxxxxxxxxx> wrote:
> Lee Winter wrote:
>> On that note I recommend that the values representing the SEM
>> fields be encoded as simple unsigned integers. Such an approach
>> will not harm communication between 754-based systems but will
>> enhance communication involving non-754-based systems. I prefer
>> that the integers be represented in decimal because it is the
>> default base that trivializes the necessary IO routines.
>
> The text-form interchange format provides this, and also eliminates
> the coding issue for Empty as it could simply be given as "Empty".
> (The international community already tolerates words like Inf and NaN.)
Yes, but we need to be strict about specifying acceptable
capitalization. For example, existing implementations vary in their
tolerance for INF, Inf, inf, or other variants.
>
> The cost of transformation between internal and exchange formats must
> be considered however, and the space requirement may not be negligible
> either when large arrays are exchanged between systems. That's why a
> representation along the lines proposed by John makes sense.
This is an issue of processing efficiency. After correctness I
prefer the promotion of compatibility and the efficiency of
implementation cost over the promotion of efficiency of processing.
This matter because it would be useful to avoid having an encoding for
each floating-point layout. For example, the SEM approach allows a
single text representation to handle all FP layouts. Thus the
implementation cost is lower and the chance of implementation glitches
is reduce by a factor related to the number of such layouts.
> It is true that being close to an internal format leads to the temptation
> of performing arithmetic on the exchange format.
I am not tempted in that direction. Who is and for what purpose?
> In a typed language we
> can keep this under control by defining syntactically separate types for
> exchange and internal formats, possibly with implicit cast conversions.
It would be possible to use the exchange syntax to specify numeric
literals, but what would be bad for readability of source code.
> Ideally this approach should also get around NaN propagation uncertainty
> -- but if platforms have difficulty preserving NaN payloads across plain
> assignments (Java?) the <NaN,Num> encoding suggested by Nate may be what
> is needed. Even without propagation rules we have payload definition
> rule issues. Ideally we would like to encode small integers, but only
> DFP specifies the encoding; for BFP it is not specified at all. (IBM Z,
> which supports a BFP<->DFP conversion machine instruction, does specify
> the encoding for BFP to make sure small (<2**22) integer NaN payloads
> survive across any sequence of radix or size conversions, but I have
> not heard of anybody else following suit. Note that this issue also
> arises when converting between BFP and decimal strings (e.g. atof(),
> when the Posix notation "NaN(nnn)" is evaluated).)
>
> Of course, Motion 29 is deliberately silent on decoration encoding, as
> we have not yet settled on the definition. I'm afraid this is in fact
> going to be messy, as there are too many possibilities, all of them
> with considerable baggage... Perhaps text representations are all that
> will survive, after all.
>
> On this tangent, I might note that separating exchange formats from
> arithmetic formats does open some new possibilities, such as stealing
> a few low-order fraction bits to hold decoration bits.
To what purpose? There is substantial complexity in any such scheme.
That complexity needs a strong justification in order to overcome the
presumption that trivial operations should be trivial to describe.
> That would
> preserve power-of-two sizes and permit relatively painless conversion
> between internal and external formats, at least for BFP. (This does
> not work for DFP with DPD encoding, though it would be ok with BID.)
At root this is a tradeoff of time for space, which is another form of
efficiency criteria. Regularity of layout for data structures is
useful for human planning purposes, but means little in the context of
compiler generated structures.
More importantly, padding to a multiple of the size of the component
data (FP values) gives the standard a much better chance to survive
for the long term. Consider that after the standard has been in use
for a few years it may be useful to add a few bits for things like
open-left/lo, open-right/hi, empty, new tags, and a few user-defined
attributes. Reserving the padding for future expansion appears to me
to be a sensible approach. Especially because the standard under
consideration is not one based on compromises among existing
practices, but more like a de novo assessment of the ideal proposal..
I suggest that commonality between the internal and external
representations might be useful, but is not a valid criteria for
determining the characteristics of either representation. We should
expect that implementations will vary in internal representation, but
any implementor who finds a need to vary the external representation
will have found a defect in the standard. We need to insure that the
external representation is as simple and complete as possible in order
to avoid the version skews that populate 13th floor of Hades.
Lee Winter
NP Engineering
Nashua, New Hampshire
United States of America (NDY)