How is memory allocated in ESXi server

virtual-machinesvirtualizationvmware-esxi

We have an ESXi 4.1 server with 48 GB RAM.

For each VM, we are allocating 4GB of memory. Since the server will have 13 virtual machines, my manager thinks this is wrong.

I am going to explain to them that ESXi will actually manage memory itself, but they asked me how much memory I allocated for the ESXi server itself.

I did not allocate any (I have not even heard of an option for allocating memory for the ESXi server itself).

How is memory allocated for ESXi server? How does it over-allocate/distribute RAM among virtual machines without issue?

Best Answer

There is a lot more than just ESXi in question here,

  1. Each VM will consume up to 4GBs + "overhead" which is documented here. This depends on the vCPUs, + memory allocated. At minimum each VM will use 4261.98 MBs (4096 + 165.98)
  2. ESXi's own memory overhead, this is hardware dependent. The easiest option is to look at the System memory usage in the vSphere client. From memory I recall it is around the 1.3GB mark, but as stated that is very dependent on hardware.

Memory Allocation & Overcommitment Explained

Note that the hypervisor won't allocate all of that memory upfront, it is dependent on the VM's usage. However, it is worthwhile understanding what will happen should the VMs try to allocate and use all of memory allocated to them.

The maximum your VM + host will try to use will be approximately, 55 GBs milage may vary

  • 1.3 GBs used by ESXi
  • 4261.98 MBs * 13 used by the VMs

There is another aspect to take into account and that's memory thresholds. By default VMware will aim to have 6% free (high memory threshold). So the 55 GBs of used memory needs to be reduced down to ~45GBs

That means the host will have approximatley 10,500 MBs of memory it needs to reclaim back from somewhere should the VMs use the memory they've been allocated. There are three things ESX does to find that additional 10.5 GBs.

Memory Reclamation Methods

  1. Transparent Page Sharing
  2. Memory Ballooning
  3. Hypervisor Swapping

You should read and understand Understanding Memory Resource Management in VMware® ESX™ Server.

Depending on a large number of factors, a combination of all three will / could happen on an over committed host. You need to test your envrionment and monitor these metrics to understand the impact of over committing.

Some rough rules that are worth knowing (all in the above paper and other sources).

  1. Transparent page sharing does not happen for VMs that use 2/4 MB pages. As you've allocated 4096 MBs to your Windows VMs, they will use the 2/4 MB pages by default (PAE dependent). Only under memory pressure will VMware break the large pages down to 4 KB pages that can be shared. TPS relies on using idle CPU cycles and scanning memory pages at a certain rate. It returns memory relatively slowly (think an hour rather than minutes). So a boot storm will means TPS will not help you. From the three, this has the lowest performance impact. More from the document,

In hardware-assisted memory virtualization (for example, Intel EPT Hardware Assist and AMD RVI Hardware Assist [6]) systems, ESX will automatically back guest physical pages with large host physical pages (2MB contiguous memory region instead of 4KB for regular pages) for better performance due to less TLB misses. In such systems, ESX will not share those large pages because: 1) the probability of finding two large pages having identical contents is low, and 2) the overhead of doing a bit-by-bit comparison for a 2MB page is much larger than for a 4KB page. However, ESX still generates hashes for the 4KB pages within each large page. Since ESX will not swap out large pages, during host swapping, the large page will be broken into small pages so that these pre-generated hashes can be used to share the small pages before they are swapped out. In short, we may not observe any page sharing for hardware-assisted memory virtualization systems until host memory is overcommitted.

  1. Ballooning kicks in next (thresholds are configurable, by default this is when the host has les than 6% memory free (between high and software)). Make sure you install the driver, and watch out for Java and managed applications in general. The OS has no insight into what the garbage collector will do next and it will end up hitting pages that have been swapped to disk. It is not uncommon practice for servers that run java applications exclusively to disable swap entirely to guarantee that doesn't happen. Have a look at Page 17 of vSphere Memory Management, SPECjbb

  2. Hypervisor swapping, from the three methods is the only one that guarantees "memory" being available to the hypervisor in a set time. This will be used if 1 & 2 do not give it enough memory to remain under the hard threshold (default of 2% free memory). When you read through the performance metrics (do your own), you'll realise this is the worst performing of the three. Aim to avoid it at all cost as the performance impact will be very noticable on nearly all applications double digit percentage

  3. There is one more state to be aware of low (by default 1%). From the manual this can drastically cut your performance,

In a rare case where host free memory drops below the low threshold, the hypervisor continues to reclaim memory through swapping and memory compression, and additionally blocks the execution of all virtual machines that consume more memory than their target memory allocations.

Summary

The key point to stress is it is impossible to predict from the whitepapers how your environment will behave.

  1. How much can TPS give you? (Depends on how similar your VMs are with their OS, Service Pack, and running applications)
  2. How quickly do your VMs allocate your memory? The quicker they do, the more likely you are to jump to the next threshold before the less impactful memory reclamation scheme succeeds in keeping you in your current threshold.
  3. Depending on application, each memory reclamation scheme will have widely varying impact.

Test your average scenarios, you're 95% percentile scenario, and finally your maximum to understand how your environment will run.


Edit 1

Worth adding that with vSphere 4 (or 4.1 can't recall), it is now possible to place the hypervisor swap on local disk but still vmotion the VM. If you're using shared storage I strongly recommend you move the hypervisor swap file to be on local disk by default. This ensures that when one host is under severe memory pressure, it doesn't end up impacting all the other vSphere hosts/VMs on the same shared storage.

Edit 2

Based on comments, made the fact that ESX doesn't allocate the memory upfront in bold...

Edit 3

Explained a little more about memory thresholds.