VMware ESX virtual machine, Windows 2008 Server R2, memory full but process total nowhere near

memorymemory usagevmware-esxwindows-server-2008

We have set up a TFS 2010 server on a Windows 2008 R2 server, and it has recently started to time out and give long operation timings.

When I log on to the machine I can see that task manager performance tab says that 3.86GB out of 4 is allocated, yet when I go to the processes tab and sum up all the running processes I end up somewhere between 700 and 900MB, depending on how long the machine has been running.

I found this question in the similar titles list, hoping it would help me:

Weird memory usage in Windows Server 2008 R2

and I ran the SQL query in the list, giving me the following items:

object_name               Counter_name                cntr_value cntr_value_MB
------------------------- --------------------------- ------------------------
SQLServer:Buffer Manager  Database pages              988        8.000000000
SQLServer:Buffer Manager  Free pages                  140        1.000000000
SQLServer:Buffer Manager  Total pages                 2923       23.000000000
SQLServer:Memory Manager  Target Server Memory (KB)   23384      22.000000000
SQLServer:Memory Manager  Total Server Memory (KB)    23384      22.000000000

I don't understand if this explains the memory problem or not, but I would think that the last column there should indicate higher values if the problem was the same as in that question. I even limited the memory usage for SQL server to 3GB and the current process tab shows that sqlservr.exe uses 92MB of memory.

Also note that 4GB has been enough for this machine earlier, and we have had no performance problems or questions with it when it operated as normal, but recent changes in behavior is troubling.

The machine is a virtual machine running on a VMware ESX 4.1 server, could that be it? I've read about "memory balloons" that VMware uses, to steal back memory in an otherwise strained system, but the total memory available on the physical server is 32GB and the performance overview says that 17 of those are available.

What else should I be looking at, or how else should I be looking at the above data?

A reboot gets the machine back to what I would call explainable numbers, but it slowly creeps back up to 4GB over the course of a day and then starts to time out.

Also note that I'm not 100% positive that memory is causing the timeouts, but the machine operates fine until memory is close to those 4GB, so at the very least it seems to be linked, but it could of course be two different effects from the same cause.

There has not been any windows updates on this machine for at least a month, for good or bad, so there's no maintenance that coincides with when the problems started.

Best Answer

The performance of a host will degrade instantly whenever "thrashing" occurs - continuous swapping in and out of memory pages under tight memory conditions.

You might have a memory leak somewhere. If the task manager does not show excessive memory usage for a single process (BTW, which value were you looking at? The task manager typically shows private bytes, though you should be looking for the "working set" for current physical memory usage), a kernel module/driver might be another possible candidate. Take a look at Process Explorer's memory statistics, especially the kernel memory usage - they will be more detailed and might get you a step further to resolution.