Thread Links			Date Links
Thread Prev	Thread Next	Thread Index	Date Prev	Date Next	Date Index

Re: Michel's comments on interchange representation

To: stds-1788@xxxxxxxxxxxxxxxxx
Subject: Re: Michel's comments on interchange representation
From: Ian McIntosh <ianm@xxxxxxxxxx>
Date: Mon, 23 Jun 2014 11:01:44 -0400
Delivered-to: mhonarc@xxxxxxxxxxxxxxxx
In-reply-to: <20140623093511.GD30299@xvii.vinc17.org>
List-help: <https://listserv.ieee.org/cgi-bin/wa?LIST=STDS-1788>, <mailto:LISTSERV@LISTSERV.IEEE.ORG?body=INFO%20STDS-1788>
List-owner: <mailto:STDS-1788-request@LISTSERV.IEEE.ORG>
List-subscribe: <mailto:STDS-1788-subscribe-request@LISTSERV.IEEE.ORG>
List-unsubscribe: <mailto:STDS-1788-unsubscribe-request@LISTSERV.IEEE.ORG>
References: <201406171756.s5HHuU6n013684@d03av05.boulder.ibm.com> <B4382F16-BC24-4810-8888-E6EAE02DD239@Cardiff.ac.uk> <64399234-b58d-435b-aaa1-63aea6ca43cc@default> <810bb4d8-cf12-48f2-89b0-62b834ef9b7d@default> <201406181447.s5IElC1a031049@d01av05.pok.ibm.com> <20140623093511.GD30299@xvii.vinc17.org>
Sender: stds-1788@xxxxxxxx

The C and C++ "char" type is a separate type that's the same representation as either "signed char" or "unsigned char", depending on the implementation. On Intel it's traditionally signed. On PowerPC it's traditionally unsigned but the XL compilers have an option so the user can choose. Which is most efficient depends on the hardware. There are uses for each. For representing a character, either works but unsigned is sometimes better. For representing a small integer it depends on whether the integer should be signed or unsigned. For representing a decoration unsigned it shouldn't matter except when doing an ordered comparison, when unsigned is better.

I think when Michel wrote "128 is not "small enough" for CHAR when CHAR is considered to be signed!" he meant "when" in the sense of "in implementations where" rather than "because". Using 128 will cause trouble whenever one does <, <=, > or >= comparisons on decorations in C and C++; also in Java where type "byte" is signed (and "char" is unsigned 16-bit Unicode). Using a smaller value avoids the problem.

- Ian McIntosh IBM Canada Lab Compiler Back End Support and Development

Vincent Lefevre ---2014-06-23 05:35:21 AM---On 2014-06-18 09:28:28 -0400, Michel Hack wrote: > This brings me to the encoding of the decorations

From:

Vincent Lefevre <vincent@xxxxxxxxxx>

To:

Ian McIntosh/Toronto/IBM@IBMCA,

Date:

2014-06-23 05:35 AM

Subject:

Re: Michel's comments on interchange representation

On 2014-06-18 09:28:28 -0400, Michel Hack wrote: > This brings me to the encoding of the decorations. The current layout > fits well with the global-bitstring approach (which has perhaps not yet > been ruled out, as Baker suggests we may need a separate vote), but it > becomes awkward when we describe the triple in terms of existing formats: > we just invented a new datatype! In byte-oriented systems a single byte > does not have Endianness issues -- but on word-oriented machines (which > IEEE 754 does not rule out) it does raise a whole new set of issues, > namely how to align the 8-bit item in a larger container. It would be > much better in my opinion to map decorations on "small integers", which > is a fairly standard datatype (called "char" in C -- which may however > have more than 8 bits). Then the decorations would be stored in whatever > way small integers are stored, and portability issues become standard, > even though they don't go away. > > Next question then is why were the particular values chosen? (I know, > it's because somebody *was* thinking about concatenating bit strings.) > ill 0 > trv 32 > def 64 > dac 96 > com 128 > > Right away, we run into an issue for some implementations: 128 is > not "small enough" for CHAR when CHAR is considered to be signed! This is something specific to the C language. What is a char in other languages? > Would it not have been better to use 0 through 4, as I seem to recall > we had at one point? > > So you see, John, that this (what appears to be a) late change DOES > have substantial consequences, and needs more than editorial changes. > Frankly, when I first raised the issue, the decoration encoding had > been only slightly annoying, but now the "signed char" issue makes it > a serious issue. Ditto for the signness of char (but C also has unsigned char, which is the best type for a byte, ***as defined by C***). -- Vincent Lefèvre <vincent@xxxxxxxxxx> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

References:
- Motion M0062: YES
  - From: Michel Hack
- Michel's comments on interchange representation
  - From: John Pryce
- Re: Motion M0062: YES
  - From: Dmitry Nadezhin
- Re: the Endian issue in interchange formats
  - From: Dmitry Nadezhin
- Michel's comments on interchange representation
  - From: Michel Hack
- Re: Michel's comments on interchange representation
  - From: Vincent Lefevre

Prev by Date: Re: Motion 62: YES (with minor amendment)
Next by Date: [no subject]
Previous by thread: Re: your mail
Next by thread: Re: the Endian issue in interchange formats
Index(es):
- Date
- Thread

From:	Vincent Lefevre <vincent@xxxxxxxxxx>
To:	Ian McIntosh/Toronto/IBM@IBMCA,
Date:	2014-06-23 05:35 AM
Subject:	Re: Michel's comments on interchange representation