Thread Links Date Links
Thread Prev Thread Next Thread Index Date Prev Date Next Date Index

Re: Michel's comments on interchange representation



The C and C++ "char" type is a separate type that's the same representation as either "signed char" or "unsigned char", depending on the implementation.  On Intel it's traditionally signed.  On PowerPC it's traditionally unsigned but the XL compilers have an option so the user can choose.  Which is most efficient depends on the hardware.  There are uses for each.  For representing a character, either works but unsigned is sometimes better.  For representing a small integer it depends on whether the integer should be signed or unsigned.  For representing a decoration unsigned it shouldn't matter except when doing an ordered comparison, when unsigned is better.

I think when Michel wrote "128 is not "small enough" for CHAR when CHAR is considered to be signed!" he meant "when" in the sense of "in implementations where" rather than "because".  Using 128 will cause trouble whenever one does <, <=, > or >= comparisons on decorations in C and C++; also in Java where type "byte" is signed (and "char" is unsigned 16-bit Unicode).  Using a smaller value avoids the problem.

- Ian McIntosh          IBM Canada Lab         Compiler Back End Support and Development


Inactive hide details for Vincent Lefevre ---2014-06-23 05:35:21 AM---On 2014-06-18 09:28:28 -0400, Michel Hack wrote: > This bVincent Lefevre ---2014-06-23 05:35:21 AM---On 2014-06-18 09:28:28 -0400, Michel Hack wrote: > This brings me to the encoding of the decorations


    From:

Vincent Lefevre <vincent@xxxxxxxxxx>

    To:

Ian McIntosh/Toronto/IBM@IBMCA,

    Date:

2014-06-23 05:35 AM

    Subject:

Re: Michel's comments on interchange representation





On 2014-06-18 09:28:28 -0400, Michel Hack wrote:
> This brings me to the encoding of the decorations.  The current layout
> fits well with the global-bitstring approach (which has perhaps not yet
> been ruled out, as Baker suggests we may need a separate vote), but it
> becomes awkward when we describe the triple in terms of existing formats:
> we just invented a new datatype!  In byte-oriented systems a single byte
> does not have Endianness issues -- but on word-oriented machines (which
> IEEE 754 does not rule out) it does raise a whole new set of issues,
> namely how to align the 8-bit item in a larger container.  It would be
> much better in my opinion to map decorations on "small integers", which
> is a fairly standard datatype (called "char" in C -- which may however
> have more than 8 bits).  Then the decorations would be stored in whatever
> way small integers are stored, and portability issues become standard,
> even though they don't go away.
>
> Next question then is why were the particular values chosen?  (I know,
> it's because somebody *was* thinking about concatenating bit strings.)
>    ill    0
>    trv   32
>    def   64
>    dac   96
>    com  128
>
> Right away, we run into an issue for some implementations:  128 is
> not "small enough" for CHAR when CHAR is considered to be signed!

This is something specific to the C language. What is a char in
other languages?

> Would it not have been better to use 0 through 4, as I seem to recall
> we had at one point?
>
> So you see, John, that this (what appears to be a) late change DOES
> have substantial consequences, and needs more than editorial changes.
> Frankly, when I first raised the issue, the decoration encoding had
> been only slightly annoying, but now the "signed char" issue makes it
> a serious issue.

Ditto for the signness of char (but C also has unsigned char,
which is the best type for a byte, ***as defined by C***).

--
Vincent Lefèvre <vincent@xxxxxxxxxx> - Web: <
https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <
https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



GIF image

GIF image