Depending on the language, it may not be necessary to use a call stack. Call stacks are only necessary in languages that allow recursion or mutual recursion. If the language does not allow recursion, then only one invocation of any procedure may be active at any moment, and local variables for that procedure may be statically allocated. Such languages do have to make provision for context changes, for interrupt handling, but this still does not require a stack.
Refer to FORTRAN IV (and earlier) and early versions of COBOL for examples of languages that do not require call stacks.
Refer to the Control Data 6600 (and earlier Control Data machines) for an example of a highly-successful early supercomputer that did not provide direct hardware support for call stacks. Refer to the PDP-8 for an example of a very successful early minicomputer that did not support call stacks.
As far as I know, the Burroughs B5000 stack machines were the first machines with hardware call stacks. The B5000 machines were designed from the ground up to run ALGOL, which required recursion. They also had one of the first descriptor-based architectures, which laid groundwork for capability architectures.
As far as I know, it was the PDP-6 (which grew into the DEC-10) that popularized call stack hardware, when the hacker community at MIT took delivery of one and discovered that the PUSHJ (Push Return Address and Jump) operation allowed the decimal print routine to be reduced from 50 instructions to 10.
The most basic function call semantics in a language that allow recursion require capabilities that match nicely with a stack. If that's all you need, then a basic stack is a good, simple match. If you need more than that, then your data structure has to do more.
The best example of needing more that I have encountered is the "continuation", the ability to suspend a computation in the middle, save it as a frozen bubble of state, and fire it off again later, possibly many times. Continuations became popular in the Scheme dialect of LISP, as a way to implement, among other things, error exits. Continuations require the ability to snapshot the current execution environment, and reproduce it later, and a stack is somewhat inconvenient for that.
Abelson & Sussman's "Structure and Interpretation of Computer Programs" goes into some detail on continuations.
Best Answer
Here are a couple of reasons to think about:
Using human readable assembly language would waste space on disk and in memory. That has an impact on caching, and therefore on performance. In your example the instruction 'push' takes up four bytes. Why not compress the program by using one byte tokens for all instructions instead of the human readable strings?
It wastes cycles on the processor. Your VM probably has at least two instruction mnemonics that start with 'p'. In order for your VM to figure out whether an instruction is 'push' or 'pop' it has to compare at least two bytes. It's much more efficient if each instruction can be uniquely identified by looking at single byte. The argument to your instructions is a string representing a number. The string has to be converted to a binary format appropriate for they underlying CPU before it can be used in arithmetic. That conversion will take dozens of instructions all by itself. Why do that every time the program is run? It's much more efficient to do it in a one-time pass when the byte code is created.