Friday, August 28, 2009

Episode Seven: One char to Rule Them All

There are three char types in C++: one is signed, one is unsigned, and the other one is neither of those. From the standard, 3.9.1.1:
Characters can be explicitly declared unsigned or signed. Plain char, signed char, and unsigned char are three distinct types. A char, a signed char, and an unsigned char occupy the same amount of storage and have the same alignment requirements (3.11); that is, they have the same object representation. For character types, all bits of the object representation participate in the value representation.

Knowing this is important when overloading functions, as well as template specialization. A function call with a narrow character literal f( 'x' ) will match a signature of type R( char ). However, if both R( signed char ) and R( unsigned char ) overloads are provided, but not a plain char one, the expression turns into an ambiguous call to an overloaded function.

The distinction between plain and signed char is also important when working with pointers:
char const *plain_char = "some narrow string literal";
signed char const *signed_char = plain_char; // error: char and signed char are different types

int const *plain_int = 0;
signed int const *signed_int = d; // OK: int and signed int are the same type
The three types of char are inherited from C, where this incompatibility between char and signed char pointers also exists.

Plain char is the type of narrow character and string literals. Whether a plain char is signed or unsigned is implementation-defined. Certain operations depend on whether the object's type is signed or unsigned (e.g. bit-wise shifts), so they can't be used with plain char when writing portable code. Plain char should be used only when working with characters, while signed and unsigned char should be used as small integral types.

No comments:

Post a Comment