Linux (non-transparent) per-process hugepage accounting

javalinuxlinux-kernel

I've recently converted some java apps to run with linux manually-configured hugepages, as described here. I point out "manually-configured" because they are not transparent hugepages, which gave us some performance issues.

So now, I've got about 10 tomcats running on a system and I am interested in knowing how much memory each one is using.

I can get summary information out of /proc/meminfo as described in Linux Huge Pages Usage Accounting.

But I can't find any tools that tell me about the actual per-process hugepage usage.

I poked around in /proc/pid/numa_stat and found some interesting information that led me to this grossity:

function pshugepage () {
    HUGEPAGECOUNT=0
    for num in `grep 'anon_hugepage.*dirty=' /proc/$@/numa_maps | awk '{print $6}' | sed 's/dirty=//'` ; do
        HUGEPAGECOUNT=$((HUGEPAGECOUNT+num))
    done
    echo process $@ using $HUGEPAGECOUNT huge pages
}

or this, in perl:

sub counthugepages {
    my $pid=$_[0];
    open (NUMAMAPS, "/proc/$pid/numa_maps") || die "can't open numa_maps";
    my $HUGEPAGECOUNT=0;
    while (my $line=<NUMAMAPS>) {
        next unless ($line =~ m{ huge }) ;
        next unless ($line =~ m{dirty=});
        chomp $line;
        $line =~ s{.*dirty=}{};
        $line =~ s{\s.*$}{};
        $HUGEPAGECOUNT+=$line;
    }
    close NUMAMAPS;
    # we want megabytes out, but we counted 2-megabyte hugepages
    return ($HUGEPAGECOUNT*2);
}

The numbers it gives me are plausible, but i'm far from confident this method is correct.

Environment is a quad-CPU dell, 64GB ram, RHEL6.3, oracle jdk 1.7.x (current as of 20130728)

Best Answer

Update: Red Hat now recommends this method for process hugepage accounting on RHEL5/6:

grep -B 11 'KernelPageSize:     2048 kB' /proc/[PID]/smaps \
   | grep "^Size:" \
   | awk 'BEGIN{sum=0}{sum+=$2}END{print sum/1024}'

I asked this on the procps-ng developers' mailing list. I was told:

The hugepage support has been introduced in the procps-ng/pmap tool several months ago (switches -XX, -C, -c, -N, -n should allow you to configure and display any entries supported by the running kernel).

I experimented a bit with this with procps-3.3.8 on fedora 19. I don't think it gave me any information i didn't get from the stuff I suggested in my question, but at least it has the aura of authority.

FWIW I ended up with the following:

.pmaprc file containing:

[Fields Display]
Size
Rss
Pss
Referenced
AnonHugePages
KernelPageSize
Mapping

[Mapping]
ShowPath

And then I used the following command to pull hugepage information:

pmap -c [process id here] | egrep 'Add|2048'

in the grep, "Add" is for the header line. "2048" will grab anything with a kernel page size of 2048, i.e., huge pages. It will also grab unrelated stuff.

Here's some sample output:

     Address    Size   Rss   Pss Referenced AnonHugePages KernelPageSize Mapping
    ed800000   22528     0     0          0             0           2048 /anon_hugepage (deleted)
    f7e00000   88064     0     0          0             0           2048 /anon_hugepage (deleted)
    fd400000   45056     0     0          0             0           2048 /anon_hugepage (deleted)
7f3753dff000    2052  2048  2048       2048          2048              4 [stack:1674]
7f3759000000    4096     0     0          0             0           2048 /anon_hugepage (deleted)
7f3762d68000    2048     0     0          0             0              4 /usr/lib64/libc-2.17.so
7f376339b000    2048     0     0          0             0              4 /usr/lib64/libpthread-2.17.so

We only care about the lines with kernelPageSize 2048.

I think it's telling me that I've allocated 159744 Kbytes (22528+88064+45056+4096) of RAM in huge pages. I told java to use exactly 128M for its heap, and it has some other memory pools, so this is a plausible number. Rss & Referenced 0 doesn't quite make sense, however the test java program is extremely simple so it too is plausible.

It doesn't agree with the number i get from my perl snippet above, because the perl is searching only for "dirty" pages - ones that have actually been used. And/or because the perl is just wrong, I don't know.

I also tried procps 3.3.9 on an RHEL6 machine with some active tomcats using lots of hugepage memory. The Rss & Referenced columns were all 0. This may very well be the fault of the kernel rather than procps, I don't know.

Related Solutions

Linux – How to monitor network I/O usage per process under Linux

nethogs looks like it will do what you want.

EDIT: I needed to install ncurses-devel, libpcap and libpcap-devel to build.

Linux Process Accounting – Difference between ‘cp’ and ‘cpu’ fields

Let me use an example to help explain what your results above show:

First: I created a bash script that I ran as the user patrickr which was meant to put enough load on the system to be noticeable.

#!/bin/bash
#this file is named loop_script.sh
for i in {1..5000}
do
   echo "Welcome $i times"
done

Second: I uninstalled and then reinstalled acct so that my files in /var/log/acct would be fresh. Create a copy of the /var/log/acct/pacct file so that in the future you can easily truncate the file with a properly formatted file (you can't just delete and recreate the file - sa will stop working if you do that). Note that this file is a log of all commands on the system and as far as I can tell there is no way to pull parts on the log based on time periods.

Third: I then ran this script twice as patrickr

patrickr@hostname:~$ bash loop_script.sh

I'll give you the results and then I'll explain them:

Ran as root (or any user other that patrickr) After first loop as patrickr:

**sa -m**
                            24       0.09re       0.03cp         0avio       894k
root                        22       0.07re       0.02cp         0avio       853k
patrickr                    2       0.02re       0.01cp         0avio      1336k

**sa -u |grep patrickr**

patrickr   0.38 cpu     1336k mem      0 io bash

After second loop as patrickr:

**sa -m**
                            30       0.09re       0.03cp         0avio       850k
root                        27       0.07re       0.02cp         0avio       814k
patrickr                    3       0.02re       0.01cp         0avio      1178k

**sa -u |grep patrickr**

patrickr   0.38 cpu     1336k mem      0 io bash            
patrickr   0.35 cpu     1336k mem      0 io bash            
patrickr   0.00 cpu      863k mem      0 io ls  (I happened to also run ls at patrickr)

**sa -u**

The results returned 106 results for a total of 2.86cpu that averaged to 0.03cp

Here's what you're seeing:

sa -m is showing averages for the all the activity for this server overtime. This file grows larger over time as more commands run.

sa -u | grep patrickr is showing the sum of the system and user time in cpu minutes for specific commands.

Running: sa -u |grep patrickr|awk 'BEGIN{TOTAL=0}{TOTAL=TOTAL+$2}END{print TOTAL}'

Will give you a combined total for user patrick but the sa -m command is actually giving you averages. Take a look at the memory values is you need a second example. They're averaged too.

If I add the three results listed above for patrickr, .35 + .37 + .0 and divide by 106 and round to the to the nearest hundredth I'll get 0.01cp.

The result of 0.01cp is the average load of user patrickr on the system in comparison to all load on the system from the time that the acct application was installed (ie since the file /var/log/acct/pacct started keeping track).

Output Fields
cpu   -  sum of system and user time in cpu minutes
re    -  actual time in minutes
k     -  cpu-time averaged core usage, in 1k units
k*sec -  cpu storage integral (kilo-core seconds)
u     -  user cpu time in cpu minutes
s     -  system time in cpu minutes

A good resource that will help you is at beginlinux.com (original link found here).

Best Answer

Related Solutions

Linux – How to monitor network I/O usage per process under Linux

Linux Process Accounting – Difference between ‘cp’ and ‘cpu’ fields

Related Topic