When I view top
on one of our servers there are a lot of nfsd processes consuming CPU:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2769 root 20 0 0 0 0 R 20 0.0 2073:14 nfsd
2774 root 20 0 0 0 0 S 19 0.0 2058:44 nfsd
2767 root 20 0 0 0 0 S 18 0.0 2092:54 nfsd
2768 root 20 0 0 0 0 S 18 0.0 2076:56 nfsd
2771 root 20 0 0 0 0 S 17 0.0 2094:25 nfsd
2773 root 20 0 0 0 0 S 14 0.0 2091:34 nfsd
2772 root 20 0 0 0 0 S 14 0.0 2083:43 nfsd
2770 root 20 0 0 0 0 S 12 0.0 2077:59 nfsd
How do I find out what these are actually doing? Can I see a list of files being accessed by each PID, or any more info?
We're on Ubuntu Server 12.04
.
I tried nfsstat
but it's not giving me much useful info about what's actually going on.
Edit – Additional stuff tried based on comments/answers:
Doing lsof -p 2774
on each of the PIDs shows the following:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
nfsd 2774 root cwd DIR 8,1 4096 2 /
nfsd 2774 root rtd DIR 8,1 4096 2 /
nfsd 2774 root txt unknown /proc/2774/exe
Does that mean no files are being accessed?
When I try and view a process with strace -f -p 2774
it gives me this error:
attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
A tcpdump | grep nfs
is showing tons of activity between two of our servers, over nfs, but as far as I'm aware they shouldn't be. A lot of entries like:
13:56:41.120020 IP 192.168.0.20.nfs > 192.168.0.21.729: Flags [.], ack 4282288820, win 32833, options [nop,nop,TS val 627282027 ecr 263985319,nop,nop,sack 3 {4282317780:4282319228}{4282297508:4282298956}{4282290268:4282291716}], len
Best Answer
In this kind of situation I often found very useful to capture the NFS traffic (e.g., with tcpdump or Wireshark) and have a look at it to see if there is a specific reason for the high load.
For example, you can use something like:
to save only NFS traffic (being on port 2049) to a capture file, then you can open that file on a PC with Wireshark and analyze it more in detail—the last time I had a similar problem, it was a bunch of computation jobs from the same user who was over disk quota, and the clients (18 different machines) were trying over and over to write, raising the load on the old NFS server very high.