[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Importing NaNs




----- Original Message -----
From: "Dan Zuras" <dan@xxxxxxxxx>
To: "Ivan Godard" <igodard@xxxxxxxxxxx>
Cc: <stds-754@xxxxxxxx>; <dan@xxxxxxxxx>
Sent: Thursday, March 27, 2003 3:07 PM
Subject: Re: Importing NaNs


> As you should be able to move the FPs directly, I'm not sure
> why the "wire form" should be different than the data format.
>
> After all, byte order aside, how is a number coming from RAM
> different from a number coming from France?
> Dan

France or wherever does not matter of course, what matters is the format. We
call these non-native forms "foreign" after going through a barrage of jokes
when we referred to "alien" forms, but the change doesn't appear to have
helped :-)

There is never a problem when the data consumed was also produced on the
same system or on another system that uses the same representation
conventions. The problems arise when a system must consume data produced on
some other system with inconsistent representation assumptions. In this the
radix problem is identical to the endian problem.

To cope with endian problems, in particular that conflicting endianness
assumptions are built into several commercially important operating systems,
most modern hardware including TMX supports hardware endianness
specification at boot time. This is typically implemented by a swizzle box
in the memory pathway. The functional units are hard wired to one endianness
or the other, but all data that flows between the core and external memory
in both directions is endian swapped (or not) based on a boot configuration
time mode bit. This lets chips support software with opposite endianness
assumptions.

Unfortunately, there is an increasing problem with wire data, which arrives
(by file or network) in an endian format that is not the memory format
selected by the boot mode (it doesn't matter if it matches the actual
processor). Real applications can have a significant fraction of their
processing capacity eaten by software endianness conversion. This is what I
meant by "foreign formats", and the problem applies to radix format
conversion too.

Representation of interchange data as text may be formally adequate, and
practically adequate for file datasets, but the conversion load is much too
great for practical network and streaming applications. All those x/10 and
x%10's just won't work for sample streams in the gigasample/second range. So
the data has to arrive in binary, and format conversion has to take no more
than the order of an add time.

For endianness, TMX defines variants of the Load and Store operations where
the memory endianness is an explicit attribute of the instruction, ignoring
the boot time mode endianness setting. That lets the swizzle box be used to
produce/consume data, not only in the software native endianness as set at
boot configuration, but also in a selected endianness for interchange with
other systems of opposite endianness, including TMXs with a different boot
mode choice. The hardware for the swizzle box already exists so the
instruction is essentially free, but saves as much as 30% on some network
applications. BTW, we are applying for a patent on this.

It seems to me that a similar problem will exist w/r/t radix. At very least,
radix conversions should be both defined and required, or your native
systems will work fine on their own data, but not be able to talk with
systems using the opposite radix save by prohibitively expensive textual
representations. Moreover, the problem arises even in single-radix
interchange if the binary representations differ - 2's complement vs.
sign/magnitude for example. Hence I urge the Standard to define an canonical
binary Wire Form, and require that all conforming implementations support
conversion between Wire Form and native concrete represention in both
directions. The Wire Form should be one that is relatively easy to convert
to any concrete representation supported by the Standard - I note that you
have already precluded some of the more bizarre possible concrete
representations such as Gray (Grey?) code. It need not have the same
bitwidth as the corresponding native form, nor even be very bit efficient at
all, but must be fixed width to avoid serializing the stream and to avoid
problems with loss of phase coherency between transmitter and receiver.

Please forgive me if I am just displaying my ignorance in expressing these
concerns - it won't have been the first time :-)

Ivan

754 | revision | FAQ | references | list archive