An approach I've used: count the number of leading 1 bits, say n
. The size of the number is then 2^n bytes (including the leading 1 bits). Take the bits after the first 0 bit as an integer, and add the maximum value (plus one) that can be represented by a number using this encoding in 2^(n-1) bytes.
Thus,
0 = 0b00000000
...
127 = 0b01111111
128 = 0b1000000000000000
...
16511 = 0b1011111111111111
16512 = 0b11000000000000000000000000000000
...
536887423 = 0b11011111111111111111111111111111
536887424 = 0b1110000000000000000000000000000000000000000000000000000000000000
...
1152921505143734399 = 0b1110111111111111111111111111111111111111111111111111111111111111
1152921505143734400 = 0b111100000000000000000000000000000000000000000000 ...
This scheme allows any non-negative value to be represented in exactly one way.
(Equivalently, used the number of leading 0 bits.)
The SAM approach is actually somewhat similar to what Scala (and C++11) do with anonymous functions (created with Scala's =>
operator or C++11's []()
(lambda) syntax).
The first question to answer on the Java side is whether to have the return type of a lambda statement be a new primitve type, like int
or byte
, or an object type of some sort. In Scala, there are no primitive types -- even an integer is an object of class Int
-- and functions are no different, being objects of class Function1
, Function2
, and so forth, depending on the number of arguments the function takes.
C++11, ruby and python similarly have lambda expressions which return an object which is callable in explicit or implicit way. The returned object has some standard method (such as #call
which can be used to call it as a function. C++11, for instance, uses a std::function
type which overloads operator()
so that calls of the object's call method even look like function calls, textually. :-)
Where the new proposal for Java gets messy is in the use of structural typing to allow assigning such a method to another object, such as a Comparator
which has a single main method with a different name. While this is conceptually icky in some ways, it does mean that the resulting object can be passed to existing functions which take an object to represent a callback, a comparator, and so on, and expect to be able to call a well-defined single method such as #compare
or #callback
. C++'s trick of overriding operator()
neatly dodged this issue even before C++11, since all such callback options were callable the same way, and thus the STL required no adjustment to allow C++11 lambdas to be used with sort
and so on. Java, not having used a standard naming for such objects in the past (perhaps because no trick like operator overloading made a single approach obvious), isn't so lucky, so this hack prevents them from having to change a whole lot of existing APIs.
Best Answer
Like so many aspects of language design, it comes to a trade-off of elegance against performance (not to mention some historical influence from earlier languages).
Alternatives
It is certainly possible (and quite simple) to make a programming language that has just a single type of natural numbers
nat
. Almost all programming languages used for academic study (e.g. PCF, System F) have this single number type, which is the more elegant solution, as you surmised. But language design in practice is not just about elegance; we must also consider performance (the extent to which performance is considered depends on the intended application of the language). The performance comprises both time and space constraints.Space constraints
Letting the programmer choose the number of bytes up-front can save space in memory-constrained programs. If all your numbers are going to be less than 256, then you can use 8 times as many
byte
s aslong
s, or used the saved storage for more complex objects. The standard Java application developer does not have to worry about these constraints, but they do come up.Efficiency
Even if we ignore space, we are still constrained by the CPU, which only has instructions that operate on a fixed number of bytes (8 bytes on a 64-bit architecture). That means even providing a single 8-byte
long
type would make the implementation of the language significantly simpler than having an unbounded natural number type, by being able to map arithmetic operations directly to a single underlying CPU instructions. If you allow the programmer to use arbitrarily large numbers, then a single arithmetic operation must be mapped to a sequence of complex machine instructions, which would slow down the program. This is point (1) that you brought up.Floating-point types
The discussion so far has only concerned integers. Floating-point types are a complex beast, with extremely subtle semantics and edge-cases. Thus, even though we could easily replace
int
,long
,short
, andbyte
with a singlenat
type, it is not clear what the type of floating-point numbers even is. They aren't real numbers, obviously, as real numbers cannot exist in a programming language. They aren't quite rational numbers, either (though it's straight-forward to create a rational type if desired). Basically, IEEE decided on a way to kinda sorta approximate real numbers, and all languages (and programmers) have been stuck with them ever since.Finally:
This isn't a valid reason. Firstly, I can't think of any situations in which types could naturally encode numerical bounds, not to mention the chances are astronomically low that the bounds the programmer wants to enforce would correspond exactly to the sizes of any of the primitive types.