Why Java Has Primitives for Different Size Numbers

data typesjavalanguage-designnumbers

In Java there are primitive types for byte, short, int and long and the same thing for float and double. Why is it necessary to have a person set how many bytes should be used for a primitive value? Couldn't the size just be determined dynamically depending on how big the number passed in was?

There are 2 reasons I can think of:

Dynamically setting the size of the data would mean it would need to be able to change dynamically as well. This could potentially cause performance issues?
Perhaps the programmer wouldn't want someone to be able to use a bigger number than a certain size and this lets them limit it.

I still think there could've been a lot to gain by simple using a single int and float type, was there a specific reason Java decided not to go this route?

Best Answer

Like so many aspects of language design, it comes to a trade-off of elegance against performance (not to mention some historical influence from earlier languages).

Alternatives

It is certainly possible (and quite simple) to make a programming language that has just a single type of natural numbers nat. Almost all programming languages used for academic study (e.g. PCF, System F) have this single number type, which is the more elegant solution, as you surmised. But language design in practice is not just about elegance; we must also consider performance (the extent to which performance is considered depends on the intended application of the language). The performance comprises both time and space constraints.

Space constraints

Letting the programmer choose the number of bytes up-front can save space in memory-constrained programs. If all your numbers are going to be less than 256, then you can use 8 times as many bytes as longs, or used the saved storage for more complex objects. The standard Java application developer does not have to worry about these constraints, but they do come up.

Efficiency

Even if we ignore space, we are still constrained by the CPU, which only has instructions that operate on a fixed number of bytes (8 bytes on a 64-bit architecture). That means even providing a single 8-byte long type would make the implementation of the language significantly simpler than having an unbounded natural number type, by being able to map arithmetic operations directly to a single underlying CPU instructions. If you allow the programmer to use arbitrarily large numbers, then a single arithmetic operation must be mapped to a sequence of complex machine instructions, which would slow down the program. This is point (1) that you brought up.

Floating-point types

The discussion so far has only concerned integers. Floating-point types are a complex beast, with extremely subtle semantics and edge-cases. Thus, even though we could easily replace int, long, short, and byte with a single nat type, it is not clear what the type of floating-point numbers even is. They aren't real numbers, obviously, as real numbers cannot exist in a programming language. They aren't quite rational numbers, either (though it's straight-forward to create a rational type if desired). Basically, IEEE decided on a way to kinda sorta approximate real numbers, and all languages (and programmers) have been stuck with them ever since.

Finally:

Perhaps the programmer wouldn't want someone to be able to use a bigger number than a certain size and this lets them limit it.

This isn't a valid reason. Firstly, I can't think of any situations in which types could naturally encode numerical bounds, not to mention the chances are astronomically low that the bounds the programmer wants to enforce would correspond exactly to the sizes of any of the primitive types.

Related Solutions

Storage Schema – Representing Integer Numbers from 0 to Infinity

An approach I've used: count the number of leading 1 bits, say n. The size of the number is then 2^n bytes (including the leading 1 bits). Take the bits after the first 0 bit as an integer, and add the maximum value (plus one) that can be represented by a number using this encoding in 2^(n-1) bytes.

Thus,

                  0 = 0b00000000
                   ...
                127 = 0b01111111
                128 = 0b1000000000000000
                   ...
              16511 = 0b1011111111111111
              16512 = 0b11000000000000000000000000000000
                   ...
          536887423 = 0b11011111111111111111111111111111
          536887424 = 0b1110000000000000000000000000000000000000000000000000000000000000
                   ...
1152921505143734399 = 0b1110111111111111111111111111111111111111111111111111111111111111
1152921505143734400 = 0b111100000000000000000000000000000000000000000000 ...

This scheme allows any non-negative value to be represented in exactly one way.

(Equivalently, used the number of leading 0 bits.)

Java 8 – Reasons for Removal of Function Types

The SAM approach is actually somewhat similar to what Scala (and C++11) do with anonymous functions (created with Scala's => operator or C++11's []() (lambda) syntax).

The first question to answer on the Java side is whether to have the return type of a lambda statement be a new primitve type, like int or byte, or an object type of some sort. In Scala, there are no primitive types -- even an integer is an object of class Int -- and functions are no different, being objects of class Function1, Function2, and so forth, depending on the number of arguments the function takes.

C++11, ruby and python similarly have lambda expressions which return an object which is callable in explicit or implicit way. The returned object has some standard method (such as #call which can be used to call it as a function. C++11, for instance, uses a std::function type which overloads operator() so that calls of the object's call method even look like function calls, textually. :-)

Where the new proposal for Java gets messy is in the use of structural typing to allow assigning such a method to another object, such as a Comparator which has a single main method with a different name. While this is conceptually icky in some ways, it does mean that the resulting object can be passed to existing functions which take an object to represent a callback, a comparator, and so on, and expect to be able to call a well-defined single method such as #compare or #callback. C++'s trick of overriding operator() neatly dodged this issue even before C++11, since all such callback options were callable the same way, and thus the STL required no adjustment to allow C++11 lambdas to be used with sort and so on. Java, not having used a standard naming for such objects in the past (perhaps because no trick like operator overloading made a single approach obvious), isn't so lucky, so this hack prevents them from having to change a whole lot of existing APIs.