@David Schwartz: I am pretty sure the kernel OOM killer kills the process. And yes we need to know which process is being killed.
I am pretty sure the process that's being killed is misbehaving in some way (or crashing), as a result it's using up most available memory at which point the kernel's OOM killer decides to finish it off. For example, this kind of behaviour was rampant (in my case) a decade or so ago when mozilla/firefox was more prone to leaking memory than it is now. It'd just use more and more and suddenly it just disappeared... you get the idea.
Edit: Please note that the comment section is now irrelevant because my original answer is gone.
Your question:
How can the Private Bytes of a process be significantly less than its effect on the system commit charge?
This can be answered with a direct quote from Mark Russinovich:
There are two types of process virtual memory that do count toward the commit limit: private and pagefile-backed.
The private bytes attributed to the process can be (and often is) less than that processes' effect on the system commit charge because the process can also be allocating pagefile-backed virtual memory.
Pagefile-backed virtual memory is difficult to attribute to a specific process because it is sharable between processes. There is no process-specific performance counter that can tell you how much pagefile-backed virtual memory any process has allocated or is referencing, yet, it does still count against the commit limit.
This article is the authoritative article on the subject, and in that article, he specifically demonstrates a case where a process has allocated tons of pagefile-backed VM, and yet the private bytes of the process remains very low.
He also shows you how to use handle.exe
to detect the allocation size of handles to section objects. That is how you can detect what process is having such a large effect on the commit charge.
You mention that you have already looked at sqlservr.exe
with handle.exe
and that it does not have handles open to a significant amount of section objects that would account for the commit charge that is released when you kill sqlservr.exe
.
Coincidentally, there are also memory allocations in kernel space that are charged against the system commit limit, such as paged and nonpaged pools, and driver locked memory, including things like virtual machine balloon drivers, etc. I don't believe this is relevant to this case but I didn't want to leave it unsaid.
SQL Server is a massive, complex product consisting of many different processes that work together on the system to provide all the SQL Server services. In fact SQL Server has its very own internal memory manager that can make it look atypical from the perspective of tools designed to measure Windows virtual memory allocations.
sqlservr.exe
does not act alone. There's also
msmdsrv.exe
(Analysis Services)
sqlwriter.exe
(SQL VSS Writer)
sqlagent.exe
(SQL Agent)
fdlauncher.exe
(Full-Text Filter Daemon Launcher)
fdhost.exe
(Full-Text host)
ReportingServicesService.exe
SQLBrowser.exe
When I kill sqlservr.exe
, sqlagent.exe
also dies automatically. This means the system commit charge will fall by the amount contributed to it by both processes. The other SQL-related processes may also be releasing pagefile-backed sections when sqlservr.exe
is killed, even though the processes themselves remain running. All of these would cause the current commit charge of the system to fall when sqlservr.exe
is killed, even though they were never part of the private bytes of sqlservr.exe
.
Best Answer
Don't confuse free memory with unused memory. Free memory, in the unix world is a page of physical memory that has no logical data mapped to it. Unused memory does have some data mapped to it, but it is currently not in active use by a running process.
Linux (and all Unix OS'es) try to have as little free memory as possibly. Instead they use memory which is not actively mapped to processes in the running OS for things like file cache and buffers for various IO transfer operations.
Something else that may be confusing you is you cannot simply add up the memory in use by all running processes to get a total memory in use figure. If you attempted this you would quickly discover that you applications appear to be using more memory than actually exists on the machine. This is for two reasons
There is a recent article on lwn.net discussing this issue.