Why Java Has Primitives for Different Size Numbers

data typesjavalanguage-designnumbers

In Java there are primitive types for byte, short, int and long and the same thing for float and double. Why is it necessary to have a person set how many bytes should be used for a primitive value? Couldn't the size just be determined dynamically depending on how big the number passed in was?

There are 2 reasons I can think of:

  1. Dynamically setting the size of the data would mean it would need to be able to change dynamically as well. This could potentially cause performance issues?
  2. Perhaps the programmer wouldn't want someone to be able to use a bigger number than a certain size and this lets them limit it.

I still think there could've been a lot to gain by simple using a single int and float type, was there a specific reason Java decided not to go this route?

Best Answer

Like so many aspects of language design, it comes to a trade-off of elegance against performance (not to mention some historical influence from earlier languages).

Alternatives

It is certainly possible (and quite simple) to make a programming language that has just a single type of natural numbers nat. Almost all programming languages used for academic study (e.g. PCF, System F) have this single number type, which is the more elegant solution, as you surmised. But language design in practice is not just about elegance; we must also consider performance (the extent to which performance is considered depends on the intended application of the language). The performance comprises both time and space constraints.

Space constraints

Letting the programmer choose the number of bytes up-front can save space in memory-constrained programs. If all your numbers are going to be less than 256, then you can use 8 times as many bytes as longs, or used the saved storage for more complex objects. The standard Java application developer does not have to worry about these constraints, but they do come up.

Efficiency

Even if we ignore space, we are still constrained by the CPU, which only has instructions that operate on a fixed number of bytes (8 bytes on a 64-bit architecture). That means even providing a single 8-byte long type would make the implementation of the language significantly simpler than having an unbounded natural number type, by being able to map arithmetic operations directly to a single underlying CPU instructions. If you allow the programmer to use arbitrarily large numbers, then a single arithmetic operation must be mapped to a sequence of complex machine instructions, which would slow down the program. This is point (1) that you brought up.

Floating-point types

The discussion so far has only concerned integers. Floating-point types are a complex beast, with extremely subtle semantics and edge-cases. Thus, even though we could easily replace int, long, short, and byte with a single nat type, it is not clear what the type of floating-point numbers even is. They aren't real numbers, obviously, as real numbers cannot exist in a programming language. They aren't quite rational numbers, either (though it's straight-forward to create a rational type if desired). Basically, IEEE decided on a way to kinda sorta approximate real numbers, and all languages (and programmers) have been stuck with them ever since.

Finally:

Perhaps the programmer wouldn't want someone to be able to use a bigger number than a certain size and this lets them limit it.

This isn't a valid reason. Firstly, I can't think of any situations in which types could naturally encode numerical bounds, not to mention the chances are astronomically low that the bounds the programmer wants to enforce would correspond exactly to the sizes of any of the primitive types.