Are the Stack and Heap hardware, OS, or language-specific concepts

heaplanguagesstack

In languages such as C or Java we have the concepts of the Stack and the Heap.

Are these abstractions the particular language runtime/compiler creates over the plain-old RAM? Or are these concepts inherent to the OS?

In other words, do the Stack and Heap "physically" generally exist in any application regardless of the programming language or technology it was built with? Or are these abstractions managed by e.g. the C language compiler/runtime, and may not actually exist in other language implementations?

Best Answer

All of the above. It's complicated.

Some CPU architectures, especially x86, include a dedicated stack pointer register and instructions that allow you to easily push/pop values on that stack.

The Java Virtual Machine is a specification that defines the operations of the JVM in terms of a stack. Most instructions push and pop operands on that stack. Actual implementations must implement these instructions, but they don't actually have to use a stack here. Instead, a JIT-compiling JVM will typically place operands in registers as far as possible.

Systems/platforms define an ABI (application binary interface). This includes things such as calling conventions. A calling convention covers details such as:

  • which registers must be saved before the call
  • which registers have to be preserved by the called function
  • where arguments are placed (usually: first arguments in registers, other arguments on the stack)
  • in which order arguments are placed on a stack
  • where return values are placed
  • where execution shall continue after the call completes (usually: a return address is saved on the stack by the caller)

The calling convention depends on many factors:

  • First and foremost, the calling convention depends on the CPU architecture because the architecture dictates which registers are available. Often, a processor vendor will suggest a calling convention in the ISA manual.

  • The operating system typically defines an ABI. Strictly speaking the only mandatory part of the ABI is the part about system calls. However, operating systems are more than a kernel. If you want to use libraries installed on a system, you need to use the same calling convention they were compiled with.

  • The compiler defines an ABI. As an extension of the above point, software should generally be compiled with the same compiler as the libraries it wants to link. Calling conventions necessarily involve implementation-specific features such as dealing with multiple return values, or managing exceptions.

  • Some languages define an ABI for that language which fosters interoperability. This is still CPU-dependent, of course.

  • Some languages/compilers make it possible to select an ABI/calling convention on a per-function basis. This is sometimes called a “foreign function interface”. The system's C calling convention is generally understood by everyone. For example, if I want to write a library in C++ but want to allow anyone to link it without having to support my compiler's C++ ABI, I might declare the functions as extern "C".

Most calling conventions use a stack, because stacks are extremely convenient: special CPU instructions make them very efficient, they are a super easy and fast way to allocate small amounts of temporary memory, and the lifetime of the allocated memory matches the nested invocation of functions.

A system that doesn't use pre-compiled binaries does not need a defined ABI: the calling convention is just an implementation detail. For example, an interpreter might use data structures in the host language to represent function calls. This might still be a stack, but it's generally not identical with the host system's stack. The Perl and CPython interpreters use C data structures as the “stack”.

Some programming languages are not a good fit for a call stack model because functions do not have that clear hierarchy that a called function terminates before the calling function, or that the lifetime of data on the call stack is identical with the time that the function is active. In particular:

  • closures/inner functions may extend the lifetime of data that would usually be stored on the stack
  • coroutines are called and suspended multiple times while keeping consistent state
    • (each “thread” needs its own stack, and pre-allocating space for the whole stack may use too much memory)
  • continuations mean that a function doesn't necessarily return to the calling function, which makes cleanup of the calling function's stack frame difficult

These problems generally require that instead of a single, simple call stack, the stack data is divided into multiple parts that are linked with pointers, and garbage-collected where necessary. For example, closures and coroutines are closely related to OOP objects.

So yes, the stack does depend on the operating system and generally exists for real in memory, but some languages might not really use that stack, and how the stack is used differs, and there might be multiple stacks, and the stack might be fragmented into multiple parts, and no stack might exist in any recognizable form.

The “heap” refers to other memory that a program can manage freely. The operating system doesn't know about the heap, it just knows about virtual address space that's mapped into a process. The language runtime manages this memory, for example malloc() in C. However, there are many different memory management techniques and not all of them should be described as a “heap” with its implied problems like memory fragmentation. In particular, arena allocators allow fixed-sized objects to be allocated efficiently. With compacting garbage collectors, memory allocation can be as fast as stack allocation and no memory fragmentation issues arise, but the GC algorithm implies a runtime overhead.

Some parts of the virtual address space are neither stack nor heap, but e.g. data segments, program memory, mmap()ed files or shared memory, or memory that represents other resources (like a frame buffer).

While writing normal software (anything other than compilers, linkers, and operating systems), and when not debugging arcane memory corruption issues in unsafe languages like C++, it is best to treat the presence or absence of a stack or heap as an implementation detail. It is sufficient to know that “compiled” implementations can typically allocate space for temporary/local variables very cheaply. In C/C++ there are also considerations about object lifetimes (e.g. don't return a pointer to an automatic variable).