The stack is the memory set aside as scratch space for a thread of execution. When a function is called, a block is reserved on the top of the stack for local variables and some bookkeeping data. When that function returns, the block becomes unused and can be used the next time a function is called. The stack is always reserved in a LIFO (last in first out) order; the most recently reserved block is always the next block to be freed. This makes it really simple to keep track of the stack; freeing a block from the stack is nothing more than adjusting one pointer.
The heap is memory set aside for dynamic allocation. Unlike the stack, there's no enforced pattern to the allocation and deallocation of blocks from the heap; you can allocate a block at any time and free it at any time. This makes it much more complex to keep track of which parts of the heap are allocated or freed at any given time; there are many custom heap allocators available to tune heap performance for different usage patterns.
Each thread gets a stack, while there's typically only one heap for the application (although it isn't uncommon to have multiple heaps for different types of allocation).
To answer your questions directly:
To what extent are they controlled by the OS or language runtime?
The OS allocates the stack for each system-level thread when the thread is created. Typically the OS is called by the language runtime to allocate the heap for the application.
What is their scope?
The stack is attached to a thread, so when the thread exits the stack is reclaimed. The heap is typically allocated at application startup by the runtime, and is reclaimed when the application (technically process) exits.
What determines the size of each of them?
The size of the stack is set when a thread is created. The size of the heap is set on application startup, but can grow as space is needed (the allocator requests more memory from the operating system).
What makes one faster?
The stack is faster because the access pattern makes it trivial to allocate and deallocate memory from it (a pointer/integer is simply incremented or decremented), while the heap has much more complex bookkeeping involved in an allocation or deallocation. Also, each byte in the stack tends to be reused very frequently which means it tends to be mapped to the processor's cache, making it very fast. Another performance hit for the heap is that the heap, being mostly a global resource, typically has to be multi-threading safe, i.e. each allocation and deallocation needs to be - typically - synchronized with "all" other heap accesses in the program.
A clear demonstration:
Image source: vikashazrati.wordpress.com
Stack allocation is much faster since all it really does is move the stack pointer.
Using memory pools, you can get comparable performance out of heap allocation, but that comes with a slight added complexity and its own headaches.
Also, stack vs. heap is not only a performance consideration; it also tells you a lot about the expected lifetime of objects.
Best Answer
Summary
All average times on this table are the same as their worst times except for Insert.
*
: everywhere in this answer, BST == Balanced BST, since unbalanced sucks asymptotically**
: using a trivial modification explained in this answer***
:log(n)
for pointer tree heap,n
for dynamic array heapAdvantages of binary heap over a BST
average time insertion into a binary heap is
O(1)
, for BST isO(log(n))
. This is the killer feature of heaps.There are also other heaps which reach
O(1)
amortized (stronger) like the Fibonacci Heap, and even worst case, like the Brodal queue, although they may not be practical because of non-asymptotic performance: Are Fibonacci heaps or Brodal queues used in practice anywhere?binary heaps can be efficiently implemented on top of either dynamic arrays or pointer-based trees, BST only pointer-based trees. So for the heap we can choose the more space efficient array implementation, if we can afford occasional resize latencies.
binary heap creation is
O(n)
worst case,O(n log(n))
for BST.Advantage of BST over binary heap
search for arbitrary elements is
O(log(n))
. This is the killer feature of BSTs.For heap, it is
O(n)
in general, except for the largest element which isO(1)
."False" advantage of heap over BST
heap is
O(1)
to find max, BSTO(log(n))
.This is a common misconception, because it is trivial to modify a BST to keep track of the largest element, and update it whenever that element could be changed: on insertion of a larger one swap, on removal find the second largest. Can we use binary search tree to simulate heap operation? (mentioned by Yeo).
Actually, this is a limitation of heaps compared to BSTs: the only efficient search is that for the largest element.
Average binary heap insert is
O(1)
Sources:
Intuitive argument:
In a binary heap, increasing the value at a given index is also
O(1)
for the same reason. But if you want to do that, it is likely that you will want to keep an extra index up-to-date on heap operations How to implement O(logn) decrease-key operation for min-heap based Priority Queue? e.g. for Dijkstra. Possible at no extra time cost.GCC C++ standard library insert benchmark on real hardware
I benchmarked the C++
std::set
(Red-black tree BST) andstd::priority_queue
(dynamic array heap) insert to see if I was right about the insert times, and this is what I got:So clearly:
heap insert time is basically constant.
We can clearly see dynamic array resize points. Since we are averaging every 10k inserts to be able to see anything at all above system noise, those peaks are in fact about 10k times larger than shown!
The zoomed graph excludes essentially only the array resize points, and shows that almost all inserts fall under 25 nanoseconds.
BST is logarithmic. All inserts are much slower than the average heap insert.
BST vs hashmap detailed analysis at: What data structure is inside std::map in C++?
GCC C++ standard library insert benchmark on gem5
gem5 is a full system simulator, and therefore provides an infinitely accurate clock with with
m5 dumpstats
. So I tried to use it to estimate timings for individual inserts.Interpretation:
heap is still constant, but now we see in more detail that there are a few lines, and each higher line is more sparse.
This must correspond to memory access latencies are done for higher and higher inserts.
TODO I can't really interpret the BST fully one as it does not look so logarithmic and somewhat more constant.
With this greater detail however we can see can also see a few distinct lines, but I'm not sure what they represent: I would expect the bottom line to be thinner, since we insert top bottom?
Benchmarked with this Buildroot setup on an aarch64 HPI CPU.
BST cannot be efficiently implemented on an array
Heap operations only need to bubble up or down a single tree branch, so
O(log(n))
worst case swaps,O(1)
average.Keeping a BST balanced requires tree rotations, which can change the top element for another one, and would require moving the entire array around (
O(n)
).Heaps can be efficiently implemented on an array
Parent and children indexes can be computed from the current index as shown here.
There are no balancing operations like BST.
Delete min is the most worrying operation as it has to be top down. But it can always be done by "percolating down" a single branch of the heap as explained here. This leads to an O(log(n)) worst case, since the heap is always well balanced.
If you are inserting a single node for every one you remove, then you lose the advantage of the asymptotic O(1) average insert that heaps provide as the delete would dominate, and you might as well use a BST. Dijkstra however updates nodes several times for each removal, so we are fine.
Dynamic array heaps vs pointer tree heaps
Heaps can be efficiently implemented on top of pointer heaps: Is it possible to make efficient pointer-based binary heap implementations?
The dynamic array implementation is more space efficient. Suppose that each heap element contains just a pointer to a
struct
:the tree implementation must store three pointers for each element: parent, left child and right child. So the memory usage is always
4n
(3 tree pointers + 1struct
pointer).Tree BSTs would also need further balancing information, e.g. black-red-ness.
the dynamic array implementation can be of size
2n
just after a doubling. So on average it is going to be1.5n
.On the other hand, the tree heap has better worst case insert, because copying the backing dynamic array to double its size takes
O(n)
worst case, while the tree heap just does new small allocations for each node.Still, the backing array doubling is
O(1)
amortized, so it comes down to a maximum latency consideration. Mentioned here.Philosophy
BSTs maintain a global property between a parent and all descendants (left smaller, right bigger).
The top node of a BST is the middle element, which requires global knowledge to maintain (knowing how many smaller and larger elements are there).
This global property is more expensive to maintain (log n insert), but gives more powerful searches (log n search).
Heaps maintain a local property between parent and direct children (parent > children).
The top node of a heap is the big element, which only requires local knowledge to maintain (knowing your parent).
Comparing BST vs Heap vs Hashmap:
BST: can either be either a reasonable:
heap: is just a sorting machine. Cannot be an efficient unordered set, because you can only check for the smallest/largest element fast.
hash map: can only be an unordered set, not an efficient sorting machine, because the hashing mixes up any ordering.
Doubly-linked list
A doubly linked list can be seen as subset of the heap where first item has greatest priority, so let's compare them here as well:
O(1)
worst case since we have pointers to the items, and the update is really simpleO(1)
average, thus worse than linked list. Tradeoff for having more general insertion position.O(n)
for bothAn use case for this is when the key of the heap is the current timestamp: in that case, new entries will always go to the beginning of the list. So we can even forget the exact timestamp altogether, and just keep the position in the list as the priority.
This can be used to implement an LRU cache. Just like for heap applications like Dijkstra, you will want to keep an additional hashmap from the key to the corresponding node of the list, to find which node to update quickly.
Comparison of different Balanced BST
Although the asymptotic insert and find times for all data structures that are commonly classified as "Balanced BSTs" that I've seen so far is the same, different BBSTs do have different trade-offs. I haven't fully studied this yet, but it would be good to summarize these trade-offs here:
See also
Similar question on CS: https://cs.stackexchange.com/questions/27860/whats-the-difference-between-a-binary-search-tree-and-a-binary-heap