C Data Types – Why Aren’t Platform Specific Integer Types Deprecated?

cdata types

TL;DR: Why isn't everybody screaming, "Don't use short, int, and long unless you really need to, and you very likely don't need to!"

I understand that, in theory, by using the types short, int, and long, you let the compiler choose the length that is most efficient for the given processor.

But is this a case of premature optimization being the root of all evil?

Suppose I have an integer variable that I know will always hold numbers from 1 to 1000. My understanding is that, assuming I am not worried about the memory difference between two and four bytes, the proponents of short/int/long would have me make that variable an int because that way the compiler can choose 16 bits or 32 bits depending on what is more efficient for the processor. If I had made it a uint16_t, the compiler may not be able to make code that is quite as fast.

But on modern hardware is that even true? Or rather, is the speed that's going to gain me (if any), really worth the much more likely possibility that using an imprecise type leads to a major bug in my program? For instance, I might use int throughout my program and think of it as representing a 32 bit value because that's what it's represented on every platform I've used for the past 20 years, but then my code is compiled on an unusual platform where int is two bytes and all sorts of bugs happen.

And aside from bugs, it just seems like an annoyingly imprecise way for programmers to talk about data. As an example, here is the definition that Microsoft gives in 2019 for a GUID structure:

typedef struct _GUID {
  unsigned long  Data1;
  unsigned short Data2;
  unsigned short Data3;
  unsigned char  Data4[8];
} GUID;

Because of what a Uuid is, that long has to mean 32 bits, those shorts have to mean 16 bits, and that char has to mean 8 bits. So why continue to talk in this imprecise language of "short", "long" and (heaven help us) "long long"?

Best Answer

I understand that, in theory, by using the types short, int, and long, you let the compiler choose the length that is most efficient for the given processor.

That is only partially true. All those types have a guaranteed minimum size in ANSI C (AFAIK even in ANSI C89). Code relying only on those minimum sizes is still portable. Cases where the maximum size of a type matters to portability are way less frequent. Said that, I have seen (and written) lots of code over the years where int was assumed to be 32 bit as minimum, written clearly for environments with >=32 bit CPUs at minimum.

But is this a case of premature optimization [...]?

Premature optimization is not only about optimizing for speed. It is about investing extra effort into code, and making code more complicated, for a (often pathological) "just in case" reason. "Just in case it could be slow" is only one of those potential reasons. So avoiding the usage of int "just in case" it could be ported to a 16 bit platform in the future could also be seen as a form of premature optimization, when this kind of porting will likely never happen.

Said that, I think the part you wrote about int is to some degree correct: in case there is any evidence a program might get ported from a 32 to a 16 bit platform, it would be best not to rely on int having 32 bits, and to use either long, a specific C99 data type like int32_t or int_least32_t whereever one is unsure whether 16 bits are enough, or not. One could also use a global typedef to define int32_t on platforms which are not C99 compliant. All of this is a little bit of extra effort (at least in teaching the team which special data types were used in the project, and why).

See also this older SO article, for which the top most answer says, most people don't need that degree of portability.

And to your example about the GUID structure: the shown data structure seems to be mostly ok, it uses data types which are guaranteed to be large enough for each of the parts on each ANSI compliant platform. So even if someone tries to use this structure for writing portable code, that would perfectly be possible.

As you noted by yourself, if someone would try to use this structure as a spec for a GUID, they could complain about the fact it is to some degree imprecise and that it requires the read the documentation in full for getting an unambigous spec. This is one of the less frequent cases where maximum size of the types may matter.

Other problems could arise when the content of such a struct is string-formatted, binary serialized, stored or transmitted somewhere, whilst making assumptions about the individual maximum size of each field, or the total size being exactly 128 bit, the endianness, or the precise binary encoding of those data types. But since the documentation of the GUID struct does not make any promises about the underlying binary representation, one should not make any assumptions about it when trying to write portable code.

Related Solutions

C Data Types – Purpose of Short, Int, and Long

It'd be defined by the architecture you were using. On a Zilog z80 chip (common embedded chip) they'd be one size while they could be an entirely different size on a x86 chipset. However, the sizes themselves are fixed ratios to each other. Essentially short and long aren't types but qualifies for the int type. Short ints will be one order of magnitude smaller than (regular) int and long ints will be an order of magnitude higher. So say your Int is bounded to 4 bytes, the short qualifier bounds it to 4 bytes though 2 bytes is also very common and the long qualifier boosts it potentially to 8 bytes though it can be less down to 4 bytes. Keep in mind that this is subject to word length as well so on a 32 bit system you'd max out at 4 bytes per int anyway making long the same as a regular int. Thus, Short ≤ Int ≤ Long.

However, if you lengthen it again, you can push in the int to the next cell giving you 8 whole bytes of storage. This is the word size for 64 bit machines so they don't have to worry about such things and just use the one cell for long ints allowing them to be another order above standard ints while long long ints get really bit.

As far as which to choose, it boils down to something that Java programmers for instance don't have to worry about. "What is your architecture?" Since it all depends on the word size of the memory of the machine in question, you have to understand that up front before you decide which to use. You then pick the smallest reasonable size to save as much memory as you can because that memory will be allocated whether you use all of the bits in it or not. So you save where you can and pick shorts when you can and ints when you can't and if you need something bigger than what regular ints you give; you'd lengthen as needed until you hit the word ceiling. Then you'd need to supply big number routines or get them from a library.

C may well be "portable assembly" but you still have to know thine hardware.

Correct For Loop Design

You forgot to mention wheter your string variables in Felix start with index 0 or 1. Searching for that in the web, its additional job for readers. And affects the way your example is evaluated.

Anyway. Are you sure that:

for(i=0; predicate(i); increment(i))

In C: "The predicate is tested after the increment, but the terminating increment is not universally valid!"

Traslates to this:

i=0
continue:
  body
  increment(i)
  if not predicate(i) goto break
  goto continue
break:

Instead of this:

continue:
  i=0
  if not predicate(i) goto break
  body
  increment(i)
  goto continue
break:

Since your for loop its more specific like pascal, you may consider how should be translated and evaluated in case the index value is equal or lesser to the initial value.

Usually, if the initial value, and final value are the same, the loop is executed once, if the final value is greater that the initial value, the loop is not executed.

Best Answer

Related Solutions

C Data Types – Purpose of Short, Int, and Long

Correct For Loop Design

Related Topic