On 2014-06-23 09:41:21 -0400, Michel Hack wrote:
I had started to think in terms of a plain character string, but here
are several reasons why I changed to the mixed text/binary format:
(0) In the above, 'p1788' is inappropriate; the 'p' (for Project) will
go away if and when this actually becomes an IEEE standard.
Yes, with "ieee", this is better. But some people might prefer the
reverse domain name notation:
https://en.wikipedia.org/wiki/Reverse_domain_name_notation
e.g. a name starting with "org.ieee."
(1) I would have had to invent short names for the byte ordering.
Dima's "MSB" is incomplete, it should be "MSBF" for "Most
Significant Byte First", but may be better than "BE" or "LE".
I think that "BE" and "LE" are commonly used, e.g. by
http://tools.ietf.org/html/rfc2781
for some charsets.
IMHO, only lower case characters (as opposed to upper case) should be
used in the name.
(2) I wanted a fixed-length signature, preferably a multiple of 4 bytes.
That fits more easily into structure layouts in many languages.
(3) I want to accomodate ALL 754-2008 formats, including the extendable
ones that can have arbitrary size (in multiples of 32 bits). Such
sizes are better represented in binary than in a decimal character
string.
Of course, not completely arbitrary sizes. But... yesterday, and this
is recalled in your next point, you said:
"assuming that the size is less than 2**24, which is presumably
large enough to cover any extended format being contemplated in
the near future"
I'm not so sure. What I know is that some people work with much larger
precisions. I don't know whether they are interested in regarding such
numbers as numbers of some 754-2008 extendable format, though.
Also, make sure that the signature format is OK for the next 30 years
(when considering standards, 30 years is the the near future).
For this reason, a 32-bit size field may not be enough. I would go for
a 64-bit one.
(4) The method of detecting Endianness by looking at a binary field of
known content is quite general. If in fact the application already
knows the size (which is likely), this could even detect some messy
mixed-endian formats, although there are values where all four bytes
would be the same and hence useless as an indicator. (That's why I
mentioned 2**24 as a practical upper limit.)
I think that detecting endianness should be done in a clean way.