This basically will depend on the virtual machines and their memory usage. ESXi employs a number of techniques allowing to overcommit memory for guests:
Memory pages which have been inactive for a while are being compressed and get uncompressed and served upon request instead of being swapped to disk or ballooned. Page compression has a configurable upper limit which is set to 10% of the guest's assigned memory by default and you can roughly estimate a 6% performance decrease when using the compression cache in real-world scenarios according to this VMWare white paper.
Virtual memory pages of different guests which have been found to carry identical information are referenced to the same physical memory page. This is an asynchronous operation freeing duplicate memory pages regularly.
A kernel-level driver in the guest supplied with the VMWare tools will claim memory in the nonpaged memory pool of the guest and mark it as "Free" for the hypervisor. This way, the memory is effectively temporarily "stolen" from the guest, inducing guest-level swapping should the memory really be needed by the guest.
If everything else fails and more memory is needed, ESXi swaps guest memory pages to disk. The location of the swap file is configurable and is placed in the same directory with the guest configuration files by default.
I have found that for my typical loads page compression and page sharing yield around 10% in memory savings over the memory overhead incurred by ESXi without notable performance degradations. Ballooning will always work, as long as it is configured to (you can effectively turn it off by reserving the entire memory amount to the guest), but basically it is only marginally better than swapping (it is where your guests would have otherwise dynamically claimed large amounts of memory for caching, but if guests are memory-starved already, it can't do magic and will incur disk I/O due to thrashing just as hypervisor-based swapping would have done).
All summed up: if you could configure your guests overcommitting just about 10% and they would continue to run without in-guest swapping and the accompanying performance degradations, you likely would be fine with your 40% overcommitment. If not, you definitely would not.
The output of the memory page of esxtop
(just press m after starting esxtop
from the SSH console) would inform you about the real-time memory statistics in more detail than the graphs you would get with the vSphere client, so it might be worth looking there:
1:54:52pm up 34 days 8:39, 214 worlds; MEM overcommit avg: 0.00, 0.00, 0.00
PMEM /MB: 32766 total: 1031 vmk, 29568 other, 2166 free
VMKMEM/MB: 32103 managed: 1926 minfree, 13525 rsvd, 18577 ursvd, high state
NUMA /MB: 8123 ( 767), 8157 ( 2425), 8157 ( 186), 7835 ( 128)
PSHARE/MB: 2162 shared, 139 common: 2023 saving
SWAP /MB: 0 curr, 0 rclmtgt: 0.00 r/s, 0.00 w/s
ZIP /MB: 17 zipped, 10 saved
MEMCTL/MB: 295 curr, 292 target, 14289 max
After some (long) conversations with VMware support, I have come to the following understanding:
The number in "Reserved Capacity" is not a function of the memory configuration for the cluster's VMs. It is the sum of several factors: any memory reservations declared on VMs, a value calculated from the HA admission policy, and an additional amount for memory management overhead. The HA admission control value is directly derived from the admission control policy; in my case, since I had it set to tolerate a single host's failure, the total amount of RAM on one of my hosts was added to the cluster's reserved capacity.
Among other constraints, it appears that HA admission control will not allow the reserved capacity to exceed the RAM in a single host. (Either that or it won't allow the available capacity to drop below the RAM on a single host; I'm still not clear on which of these is really the case, since they're the same thing in my two-host cluster.) This has the net result that practically any amount of memory reservation is incompatible with what would otherwise seem to be natural settings for HA admission policy in a two-host cluster. Since Fault Tolerance forces memory reservations, that makes it similarly incompatible. I was told that if there were more hosts in the cluster, the reserved capacity would be "spread out" across more of them and some degree of memory reservation would be possible.
The net result for me is that I had to change my HA admission policy to reserve a percentage of the available resources (instead of "one host's worth") and calculate that percentage to exclude any memory reservations necessitated by the use of Fault Tolerance.
Best Answer
The impact is greater than if you were running an older CPU with no support for EPT, but realistically the only way to determine if it's going to affect your workload is to actually profile the workload. Don't take the hypervisor out of the equation, because the whole point is to test the setup that you're running, not some hypothetical benchmark figure. Ignore Memtest86+, ignore bare-metal OSes, just find a virtual machine that's representative of a memory-intensive workload in your environment and beat the crap out of it.
Guy is dead-on: most consolidated workloads are bound by the amount of memory in the system and not by any other resource. By decreasing memory contention, the extra memory will probably help you out by enabling you to use more of your memory-intensive VMs' RAM as cache instead of it being ballooned out under pressure.