R – Potential problem with C standard malloc’ing chars

cc++11mallocsizeofstandards

When answering a comment to another answer of mine here, I found what I think may be a hole in the C standard (c1x, I haven't checked the earlier ones and yes, I know it's incredibly unlikely that I alone among all the planet's inhabitants have found a bug in the standard). Information follows:

  1. Section 6.5.3.4 ("The sizeof operator") para 2 states "The sizeof operator yields the size (in bytes) of its operand".
  2. Para 3 of that section states: "When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1".
  3. Section 7.20.3.3 describes void *malloc(size_t sz) but all it says is "The malloc function allocates space for an object whose size is specified by size and whose value is indeterminate". It makes no mention at all what units are used for the argument.
  4. Annex E startes the 8 is the minimum value for CHAR_BIT so chars can be more than one byte in length.

My question is simply this:

In an environment where a char is 16 bits wide, will malloc(10 * sizeof(char)) allocate 10 chars (20 bytes) or 10 bytes? Point 1 above seems to indicate the former, point 2 indicates the latter.

Anyone with more C-standard-fu than me have an answer for this?

Best Answer

In a 16-bit char environment malloc(10 * sizeof(char)) will allocate 10 chars (10 bytes), because if char is 16 bits, then that architecture/implementation defines a byte as 16 bits. A char isn't an octet, it's a byte. On older computers this can be larger than the 8 bit de-facto standard we have today.

The relevant section from the C standard follows:

3.6 Terms, definitions and symbols

byte - addressable unit of data storage large enough to hold any member of the basic character set of the execution environment...

NOTE 2 - A byte is composed of a contiguous sequence of bits, the number of which is implementation-defined.

Related Topic