I want to use a function that expects data like this:
void process(char *data_in, int data_len);
So it's just processing some bytes really.
But I'm more comfortable working with "unsigned char" when it comes to raw bytes (it somehow "feels" more right to deal with positive 0 to 255 values only), so my question is:
Can I always safely pass a unsigned char *
into this function?
In other words:
- Is it guaranteed that I can safely convert (cast) between char and unsigned char at will, without any loss of information?
- Can I safely convert (cast) between pointers to char and unsigned char at will, without any loss of information?
Bonus: Is the answer same in C and C++?
Best Answer
The short answer is yes if you use an explicit cast, but to explain it in detail, there are three aspects to look at:
1) Legality of the conversion
Converting between
signed T*
andunsigned T*
(for some typeT
) in either direction is generally possible because the source type can first be converted tovoid *
(this is a standard conversion, §4.10), and thevoid *
can be converted to the destination type using an explicitstatic_cast
(§5.2.9/13):This can be abbreviated (§5.2.10/7) as
because
char
is a standard-layout type (§3.9.1/7,8 and §3.9/9) and signedness does not change alignment (§3.9.1/1). It can also be written as a C-style cast:Again, this works both ways, from
unsigned*
tosigned*
and back. There is also a guarantee that if you apply this procedure one way and then back, the pointer value (i.e. the address it's pointing to) won't have changed (§5.2.10/7).All of this applies not only to conversions between
signed char *
andunsigned char *
, but also tochar *
/unsigned char *
andchar *
/signed char *
, respectively. (char
,signed char
andunsigned char
are formally three distinct types, §3.9.1/1.)To be clear, it doesn't matter which of the three cast-methods you use, but you must use one. Merely passing the pointer will not work, as the conversion, while legal, is not a standard conversion, so it won't be performed implicitly (the compiler will issue an error if you try).
2) Well-definedness of the access to the values
What happens if, inside the function, you dereference the pointer, i.e. you perform
*data_in
to retrieve a glvalue for the underlying character; is this well-defined and legal? The relevant rule here is the strict-aliasing rule (§3.10/10):Therefore, accessing a
signed char
(orchar
) through anunsigned char*
(orchar
) and vice versa is not disallowed by this rule – you should be able to do this without problems.3) Resulting values
After derefencing the type-converted pointer, will you be able to work with the value you get? It's important to bear in mind that the conversion and dereferencing of the pointer described above amounts to reinterpreting (not changing!) the bit pattern stored at the address of the character. So what happens when a bit pattern for a signed character is interpreted as that of an unsigned character (or vice versa)?
When going from unsigned to signed, the typical effect will be that for values between 0 and 128 nothing happens, and values above 128 become negative. Similar in reverse: When going from signed to unsigned, negative values will appear as values greater than 128.
But this behaviour isn't actually guaranteed by the Standard. The only thing the Standard guarantees is that for all three types,
char
,unsigned char
andsigned char
, all bits (not necessarily 8, btw) are used for the value representation. So if you interpret one as the other, make a few copies and then store it back to the original location, you can be sure that there will be no information loss (as you required), but you won't necessarily know what the values actually mean (at least not in a fully portable way).