Linux – Inode tables shrinking sharply over time causing rsync/inode problems

linuxrsyncxfs

We setup a Linux (it's on Amazon AWS, a CentOS-like system although we are not exactly sure the customizations done on it) system with 4TB storage as an XFS volume over LVM (ultimately to be used for serving over NFS4, but it is not in use yet), and we are in the process of using rsync to sync files from a production NFS server of ours to the XFS volume (i.e. we rsync from a source over NFS to a locally mounted XFS-based LVM volume). However, we observed that at some point in the middle rsync started to become increasingly sluggish (throughput sharply reduced) and both load average and memory consumption rose by a large extent (and CPU has very large proportion in iowait). Eventually I rebooted the XFS system and the system apparently resumed to normal, with more normal rsync performance since, at least for the past 24 hours.

We checked the munin monitoring graphs and didn't notice anything obvious, but we found that the "Inode table size" and "open inode" metrics (checked the munin plugin implementation which points to the values as being read from /proc/sys/fs/inode-nr) kept decreasing over time. Shortly before we observed rsync getting stuck, we observed both metrics were down to the value of a few thousands from several hundred thousands (our non-XFS servers stay at roughly 500k most of the time and do not show any monotonic decreasing trend over extended periods), and we observed logs from the kernel like these:

ip-XX-XXX-XXX-XXX login: [395850.680006] hrtimer: interrupt took 20000573 ns
Sep 18 17:19:58 ip-XX-XXX-XXX-XXX kernel: [395850.680006] hrtimer: interrupt took 20000573 ns
[400921.660046] INFO: task rsync:7919 blocked for more than 120 seconds.
[400921.660066] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[400921.660077] rsync         D ffff880002fe4240     0  7919   7918 0x00000000
[400921.660093]  ffff8800683e5638 0000000000000282 ffff880000000000 0000000000014240
[400921.660131]  ffff8800683e5fd8 0000000000014240 ffff8800683e5fd8 ffff88000726da40
[400921.660153]  0000000000014240 0000000000014240 ffff8800683e5fd8 0000000000014240
[400921.660176] Call Trace:
[400921.660202]  [] schedule_timeout+0x1fd/0x270
[400921.660220]  [] ? pvclock_clocksource_read+0x58/0xd0
[400921.660234]  [] ? __raw_callee_save_xen_irq_enable+0x11/0x26
[400921.660247]  [] __down+0x76/0xc0
[400921.660262]  [] down+0x3b/0x50
[400921.660274]  [] ? _raw_spin_unlock_irqrestore+0x19/0x20
[400921.660314]  [] xfs_buf_lock+0x2b/0x80 [xfs]
[400921.660338]  [] _xfs_buf_find+0x139/0x230 [xfs]
[400921.660360]  [] xfs_buf_get+0x5b/0x160 [xfs]
[400921.660378]  [] xfs_buf_read+0x13/0xa0 [xfs]
[400921.660401]  [] xfs_trans_read_buf+0x197/0x2c0 [xfs]
[400921.660422]  [] xfs_read_agi+0x6f/0x100 [xfs]
[400921.660443]  [] xfs_ialloc_read_agi+0x29/0x90 [xfs]
[400921.660467]  [] xfs_ialloc_ag_select+0x12b/0x280 [xfs]
[400921.660485]  [] xfs_dialloc+0x3c7/0x870 [xfs]
[400921.660500]  [] ? pvclock_clocksource_read+0x58/0xd0
[400921.660509]  [] ? __raw_callee_save_xen_restore_fl+0x11/0x1e
[400921.660531]  [] xfs_ialloc+0x60/0x6a0 [xfs]
[400921.660550]  [] ? xlog_grant_log_space+0x39c/0x3f0 [xfs]
[400921.660566]  [] ? xen_spin_lock+0xa5/0x110
[400921.660583]  [] xfs_dir_ialloc+0x7d/0x2d0 [xfs]
[400921.660606]  [] ? xfs_log_reserve+0xe2/0xf0 [xfs]
[400921.660623]  [] xfs_create+0x3f7/0x600 [xfs]
[400921.660638]  [] ? __raw_callee_save_xen_restore_fl+0x11/0x1e
[400921.660655]  [] xfs_vn_mknod+0xa2/0x1b0 [xfs]
[400921.660678]  [] xfs_vn_create+0xb/0x10 [xfs]
[400921.660689]  [] vfs_create+0xa7/0xd0
[400921.660701]  [] do_last+0x529/0x650
[400921.660714]  [] ? get_empty_filp+0x75/0x170
[400921.660728]  [] do_filp_open+0x213/0x670
[400921.660744]  [] ? xen_spin_lock+0xa5/0x110
[400921.660753]  [] ? __raw_callee_save_xen_restore_fl+0x11/0x1e
[400921.660769]  [] ? alloc_fd+0x102/0x150
[400921.660780]  [] do_sys_open+0x64/0x130
[400921.660792]  [] ? __raw_callee_save_xen_irq_disable+0x11/0x1e
[400921.660804]  [] sys_open+0x1b/0x20
[400921.660815]  [] system_call_fastpath+0x16/0x1b

We also observed drastic increase in "lookup" operation as seen on the source NFS when that happened, which previously remained stable before we started to experience the said rsync issue.

We have not observed similar behaviour on our production volumes that are ext3-based and in fact those were with even larger volume sizes. Other than the filesystem difference, the file servers are on similar machine class and setup in a similar manner. As we found that the inode table metrics on the XFS server just now are still on the decreasing trend similar to our earlier observation even though we have just rebooted it yesterday, I am concerned the same issue will haunt us again soon, and may likely reflect some issues with our setup, kernel or whatever.

We are on inode64-mounted XFS volumes on x86_64 architecture machines when we experienced this. Right now we have copied about 1.3TB of data to the XFS volume whose capacity is approximately 4TB and we should have around 3TB of data in that volume if fully copied. The volume was created anew so has been inode64-mounted from the very beginning when there was no data inside, so the filesystem should be clean and inodes evenly distributed.

Any insights as to what might be the cause?

(p.s. in fact we started seeing this again since a few hours ago!)

Best Answer

Enabling XFS delayed logging (delaylog mount option) might help (see http://en.wikipedia.org/wiki/XFS#Disadvantages). CentOS is (in)famous for using a patched kernel so a kernel upgrade may be needed.