Linux – How to keep subtree removal (`rm -rf`) from starving other processes for Disk I/O

hard driveioionicelinuxrm

We have a very large (multi-GB) Nginx cache directory for a busy site, which we occasionally need to clear all at once. I've solved this in the past by moving the cache folder to a new path, making a new cache folder at the old path, and then rm -rfing the old cache folder.

Lately, however, when I need to clear the cache on a busy morning, the I/O from rm -rf is starving my server processes of disk access, as both Nginx and the server it fronts for are read-intensive. I can watch the load average climb while the CPUs sit idle and rm -rf takes 98-99% of Disk IO in iotop.

I've tried ionice -c 3 when invoking rm, but it seems to have no appreciable effect on the observed behavior.

Is there any way to tame rm -rf to share the disk more? Do I need to use a different technique that will take its cues from ionice?

Update:

The filesystem in question is an AWS EC2 instance store (the primary disk is EBS). The /etc/fstab entry looks like this:

/dev/xvdb       /mnt    auto    defaults,nobootwait,comment=cloudconfig 0       2

Best Answer

All data gathered from this page. Below are some options to delete large directory of files. Check out the writeup for the details of how this was produced.

Command                                 Elapsed System Time %CPU cs1* (Vol/Invol)
rsync -a –delete empty/ a                10.60      1.31    95%  106/22
find b/ -type f -delete                  28.51      14.46   52%  14849/11
find c/ -type f | xargs -L 100 rm        41.69      20.60   54%  37048/15074
find d/ -type f | xargs -L 100 -P 100 rm 34.32      27.82   89%  929897/21720
rm -rf f                                 31.29      14.80   47%  15134/11

*cs1 is context switches voluntary and involuntary