Yes, it pretty much comes down to efficiency. But you seem to be underestimating the impact (or overestimating how well various optimizations work).
First, it's not just "spatial overhead". Making primitives boxed/heap-allocated has performance costs too. There's the additional pressure on the GC to allocate and collect those objects. This goes doubly if the "primitive objects" are immutable, as they should be. Then there's more cache misses (both because of the indirection and because less data fits into a given amount of cache). Plus the bare fact that "load the address of an object, then load the actual value from that address" takes more instructions than "load the value directly".
Second, escape analysis isn't go-faster fairy dust. It only applies to values that, well, don't escape. It's certainly nice to optimize local calculations (such as loop counters and intermediate results of calculations) and it will give measurable benefits. But a much larger majority of values live in the fields of objects and arrays. Granted, those can be subject to escape analysis themselves, but as they're usually mutable reference types, any aliasing of them presents a significant challenge to the escape analysis, which now has to prove that those aliases (1) don't escape either, and (2) don't make a difference for the purpose of eliminating allocations.
Given that calling any method (including getters) or passing an object as argument to any other method can help the object escape, you'll need interprocedural analysis in all but the most trivial cases. This is far more expensive and complicated.
And then there are cases where things really do escape and can't reasonably be optimized away. Quite many of them, actually, if you consider how often C programmers go through the trouble of heap-allocating things. When an object containing an int escapes, escape analysis ceases to apply to the int as well. Say goodbye to efficient primitive fields.
This ties into another point: The analyses and optimizations required are seriously complicated and an active area of research. It's debatable whether any language implementation ever achieved the degree of optimization you suggest, and even if so, it's been a rare and herculean effort. Surely standing on the shoulders of these giants is easier than being a giant yourself, but it's still far from trivial. Don't expect competitive performance any time in the first few years, if ever.
That is not to say such languages can't be viable. Clearly they are. Just don't assume it will be line-for-line as fast as languages with dedicated primitives. In other words, don't delude yourself with visions of a sufficiently smart compiler.
Best Answer
In your example, you don't really show the same message, you show two different messages that happen to have the same name. Polymorphism requires that the sender of a message can send it without knowing the exact recipient. Without seeing evidence that the caller can do something like
shape.draw()
without knowing whethershape
contains a circle or a rectangle, you may or may not have actual polymorphism. They could be as unrelated ascircle.draw()
andweapon.draw()
.They don't necessarily have to both implement the same nominal interface. The language could support structural typing or compile-time templating and it would still be called polymorphism. As long as the caller doesn't care who the callee is.