RSYNC for Incremental backup takes 3+ days

rsync

I've built an incremental backup solution which uses RSYNC to backup a few of our servers. I'm using PHP to run through config files to get the information for each server that needs to be backed up. PHP then calls RSYNC to handle the remote backup of the servers, incrementally.

This works perfectly on all of our servers, and takes just a few minutes to finish….all except for one server. This server has a lot of data, and it seems that RSYNC just hangs on it. It takes over 3 days to do a single incremental backup. My guess is that it's stuck on building the file list.

When I run the below command on the folder I want to backup, here are the 'iused' results.

df -i folder/
54176307

Is this simply too much data for RSYNC to handle? Should I be looking into another alternative? The backup server is currently running on version 3.0.8, however, the clients being backed up are all running RSYNC 2.6.9. Do you think upgrading everything to 3.0.8 would make a difference and reduce the 3 day backup time for this server?

Thanks,
Jacob

Best Answer

I doubt that the upgrade alone will provide the sort of improvement you're looking for. At 72 hours, you'd probably want an order of magnitude performance increase (7.2 hours). If you're looking for 2-3 hours, good luck without an SSD and a good network.

With 55 million inodes (assuming approx as many files), you're going to have to seriously reconsider your approach. First, if you are using an ext variant I'd consider benchmarking a different FS.

Second, if you're using an ext FS (say ext3/4), the first thing I'd do is shut off atime! With atime on, every time a file is read/looked at the filesystem has to do a tiny write to the disk as atime means "access time". By shutting it off, you lose the ability to see when a file was accessed but that's the way the cookie crumbles. If you are using a standard SATA disk, assume you can do 100 IOs per second (IOPS). Each access write takes one of those (worst case). That means 100 files a second just to verify it's existence, and by the time you read it you are using even more IOPS. 55000000/100 = 550000s = 152 hours. Once you figure in the kernel's very good algorithms to merge the IOPS, you've probably found your bottleneck.

In your /etc/fstab, use the mount option:

noatime,nodiratime 

to completely disable atimes. Leave off nodiratime to leave access times off for directories. If you have a lot of directories, I'd recommend turning it off.

I bet this alone will help dramatically.

Here's an example fstab:

# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
proc            /proc           proc    nodev,noexec,nosuid 0       0
# / was on /dev/sda1 during installation
UUID=66188c62-0d8c-46d8-a44f-8f673ca6ac98 /               ext4    errors=remount-ro,discard,noatime,nodiratime 0       1
# swap was on /dev/sda6 during installation
UUID=c3f40312-d6f9-4bb7-b426-602a4b7a6c47 none            swap    sw              0       0
Related Topic