C++ – turn unsigned char into char and vice versa

c

I want to use a function that expects data like this:

void process(char *data_in, int data_len);

So it's just processing some bytes really.

But I'm more comfortable working with "unsigned char" when it comes to raw bytes (it somehow "feels" more right to deal with positive 0 to 255 values only), so my question is:

Can I always safely pass a unsigned char * into this function?

In other words:

  • Is it guaranteed that I can safely convert (cast) between char and unsigned char at will, without any loss of information?
  • Can I safely convert (cast) between pointers to char and unsigned char at will, without any loss of information?

Bonus: Is the answer same in C and C++?

Best Answer

The short answer is yes if you use an explicit cast, but to explain it in detail, there are three aspects to look at:

1) Legality of the conversion
Converting between signed T* and unsigned T* (for some type T) in either direction is generally possible because the source type can first be converted to void * (this is a standard conversion, §4.10), and the void * can be converted to the destination type using an explicit static_cast (§5.2.9/13):

static_cast<unsigned char*>(static_cast<void *>(data_in))

This can be abbreviated (§5.2.10/7) as

reinterpret_cast<unsigned char *>(data_in)

because char is a standard-layout type (§3.9.1/7,8 and §3.9/9) and signedness does not change alignment (§3.9.1/1). It can also be written as a C-style cast:

(unsigned char *)(data_in)

Again, this works both ways, from unsigned* to signed* and back. There is also a guarantee that if you apply this procedure one way and then back, the pointer value (i.e. the address it's pointing to) won't have changed (§5.2.10/7).

All of this applies not only to conversions between signed char * and unsigned char *, but also to char */unsigned char * and char */signed char *, respectively. (char, signed char and unsigned char are formally three distinct types, §3.9.1/1.)

To be clear, it doesn't matter which of the three cast-methods you use, but you must use one. Merely passing the pointer will not work, as the conversion, while legal, is not a standard conversion, so it won't be performed implicitly (the compiler will issue an error if you try).

2) Well-definedness of the access to the values
What happens if, inside the function, you dereference the pointer, i.e. you perform *data_in to retrieve a glvalue for the underlying character; is this well-defined and legal? The relevant rule here is the strict-aliasing rule (§3.10/10):

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

  • [...]
  • a type that is the signed or unsigned type corresponding to the dynamic type of the object,
  • [...]
  • a char or unsigned char type.

Therefore, accessing a signed char (or char) through an unsigned char* (or char) and vice versa is not disallowed by this rule – you should be able to do this without problems.

3) Resulting values
After derefencing the type-converted pointer, will you be able to work with the value you get? It's important to bear in mind that the conversion and dereferencing of the pointer described above amounts to reinterpreting (not changing!) the bit pattern stored at the address of the character. So what happens when a bit pattern for a signed character is interpreted as that of an unsigned character (or vice versa)?

When going from unsigned to signed, the typical effect will be that for values between 0 and 128 nothing happens, and values above 128 become negative. Similar in reverse: When going from signed to unsigned, negative values will appear as values greater than 128.

But this behaviour isn't actually guaranteed by the Standard. The only thing the Standard guarantees is that for all three types, char, unsigned char and signed char, all bits (not necessarily 8, btw) are used for the value representation. So if you interpret one as the other, make a few copies and then store it back to the original location, you can be sure that there will be no information loss (as you required), but you won't necessarily know what the values actually mean (at least not in a fully portable way).

Related Topic