C Data Types – Why Aren’t Platform Specific Integer Types Deprecated?

cdata types

TL;DR: Why isn't everybody screaming, "Don't use short, int, and long unless you really need to, and you very likely don't need to!"


I understand that, in theory, by using the types short, int, and long, you let the compiler choose the length that is most efficient for the given processor.

But is this a case of premature optimization being the root of all evil?

Suppose I have an integer variable that I know will always hold numbers from 1 to 1000. My understanding is that, assuming I am not worried about the memory difference between two and four bytes, the proponents of short/int/long would have me make that variable an int because that way the compiler can choose 16 bits or 32 bits depending on what is more efficient for the processor. If I had made it a uint16_t, the compiler may not be able to make code that is quite as fast.

But on modern hardware is that even true? Or rather, is the speed that's going to gain me (if any), really worth the much more likely possibility that using an imprecise type leads to a major bug in my program? For instance, I might use int throughout my program and think of it as representing a 32 bit value because that's what it's represented on every platform I've used for the past 20 years, but then my code is compiled on an unusual platform where int is two bytes and all sorts of bugs happen.

And aside from bugs, it just seems like an annoyingly imprecise way for programmers to talk about data. As an example, here is the definition that Microsoft gives in 2019 for a GUID structure:

typedef struct _GUID {
  unsigned long  Data1;
  unsigned short Data2;
  unsigned short Data3;
  unsigned char  Data4[8];
} GUID;

Because of what a Uuid is, that long has to mean 32 bits, those shorts have to mean 16 bits, and that char has to mean 8 bits. So why continue to talk in this imprecise language of "short", "long" and (heaven help us) "long long"?

Best Answer

I understand that, in theory, by using the types short, int, and long, you let the compiler choose the length that is most efficient for the given processor.

That is only partially true. All those types have a guaranteed minimum size in ANSI C (AFAIK even in ANSI C89). Code relying only on those minimum sizes is still portable. Cases where the maximum size of a type matters to portability are way less frequent. Said that, I have seen (and written) lots of code over the years where int was assumed to be 32 bit as minimum, written clearly for environments with >=32 bit CPUs at minimum.

But is this a case of premature optimization [...]?

Premature optimization is not only about optimizing for speed. It is about investing extra effort into code, and making code more complicated, for a (often pathological) "just in case" reason. "Just in case it could be slow" is only one of those potential reasons. So avoiding the usage of int "just in case" it could be ported to a 16 bit platform in the future could also be seen as a form of premature optimization, when this kind of porting will likely never happen.

Said that, I think the part you wrote about int is to some degree correct: in case there is any evidence a program might get ported from a 32 to a 16 bit platform, it would be best not to rely on int having 32 bits, and to use either long, a specific C99 data type like int32_t or int_least32_t whereever one is unsure whether 16 bits are enough, or not. One could also use a global typedef to define int32_t on platforms which are not C99 compliant. All of this is a little bit of extra effort (at least in teaching the team which special data types were used in the project, and why).

See also this older SO article, for which the top most answer says, most people don't need that degree of portability.

And to your example about the GUID structure: the shown data structure seems to be mostly ok, it uses data types which are guaranteed to be large enough for each of the parts on each ANSI compliant platform. So even if someone tries to use this structure for writing portable code, that would perfectly be possible.

As you noted by yourself, if someone would try to use this structure as a spec for a GUID, they could complain about the fact it is to some degree imprecise and that it requires the read the documentation in full for getting an unambigous spec. This is one of the less frequent cases where maximum size of the types may matter.

Other problems could arise when the content of such a struct is string-formatted, binary serialized, stored or transmitted somewhere, whilst making assumptions about the individual maximum size of each field, or the total size being exactly 128 bit, the endianness, or the precise binary encoding of those data types. But since the documentation of the GUID struct does not make any promises about the underlying binary representation, one should not make any assumptions about it when trying to write portable code.