Adding up the memory usage of all processes will not generally produce meaningful results. That will leave two major users of memory unaccounted for, the system cache and the standby list. You cannot account for memory usage by simply adding up a list of numbers. The memory management system is far too complex for that.
What you are referring to is process checkpointing. There is some work in the later kernels to offer this (in conjunction with the freezer cgroup) but its not ready yet.
This is actually very difficult to achieve well unfortunately because certain resources which are shared go stale after being unavailable for a fixed period of time (TCP springs to mind, although this may also apply to applications that use a wall clock, or perhaps some shared memory that changes state during a processes offline period).
As for stopping the process when it reaches a certain memory utilization, theres a hack I can think of that will do this.
- You create a cgroup that contains the freezer and memory subsystems.
- Place your task(s) inside of the cgroup.
- Attach a process to
cgroup.event_control
and set a memory threshold that you do not want to exceed (this is somewhat explained in the kernel documentation.)
- At exceed time you freeze the cgroup. The kernel should eventually evict these pages to swap (providing your cgroup has enough).
Note the "freeze" cgroup will not evict pages to a media persistent location, but it will swap the pages out when enough time has passed and the pages are needed for something else.
Even if this does work (its pretty hacky if it did) you need to consider whether or not this is really doing anything to solve your problem.
- How do you know it wouldn't be better to allow a process using a lot of memory to just go faster to finish quickly its memory intensive period and relinquish the memory?
- If you try to wake processes up fairly by round-robining processes - you could argue you're doing a worse job than what the CPU scheduler is already doing for you.
- If some processes are more important than others (and should be woken up longer/finish quicker) its probably better to just allocate them more cpu time than keeping other processes completely frozen.
- Whilst it would be slow -- you could add a lot of swap (so you can never overcommit) then greatly reduce the interactivity of the scheduler to try to help you reduce aggressive page evictions. This is done in
sched_min_granularity_ns
.
Unfortunately, the best solution would be the ability to checkpoint your tasks. Its a shame that most of the implementations are just not that concrete enough yet.
Alternatively, you could wait a couple of years for proper checkpoint/restore to be available in the kernel!
Best Answer
You can't do this without being root. You need to be root, and you use
svmon -U
You can get some information from
ps
,topas
, maybe evennmon
, but not down to the user level. That's whatsvmon
is for.