Rsync –inplace not reading target file

backupincremental-backuprsync

I'm trying to optimize the daily backup of a LVM snapshot of a large MySQL database. It works quite ok when I just cp the files (local RAID to other local RAID), with an average speed of ~100MB/s. But since the database files (600GB, most of it in two files of 350GB and 250GB) do not change very much over the course of one day, I thought it would be more efficient to only copy the changed blocks.

I'm using

rsync --safe-links --inplace -crptogx -B 8388608 /source/ /destination/

It did work, was slower than the simple copy, and I did not see any read activity on the target disk. My thought was that rsync would read (8MB) blocks from the source and the destination, compare their checksums and only copy the source block into the target file if it was changed. Am I being mistaken here? Why am I not seeing rsync read from the target in order to determine if the blocks have changed?

Here are some graphs:

Disk usage: you see that rsync –inplace (only done for the bigger file on the last day) reduced the "dent" in the disk usage of /mnt/backup, meaning that it did indeed update the existing file in place.

IO stats: the backup is made from sda to sdb. Somehow there is a huge peak in reads from the source, followed by the "normal" read(source)+write(target) activity. I was expecting simultaneous reads from both devices with little write activity on the target.

enter image description here

Best Answer

What you are probably seeing is due to the way how your files are changed and how rsync is calculating checksums. The rsync man page regarding --inplace has a basic explanation:

          o      The efficiency of rsync's delta-transfer algorithm may be
                 reduced if some data in the destination file is overwrit-
                 ten  before  it  can be copied to a position later in the
                 file.  This does not apply if  you  use  `--backup`,  since
                 rsync is smart enough to use the backup file as the basis
                 file for the transfer.

So you should probably either not use --inplace or use --backup to preserve the old copy of the file. This being said, rsync seems to handle large files rather inefficiently, so it may be not the best tool for the job.

If you are using LVM and really want to transfer snapshot data, you might not want to run rsync which is quite calculation- and I/O intensive on both sides but copy the snapshot's CoW data over to the destination machine using lvmsync instead - this would spare you the I/O and the CPU cycles at the price of a presumably larger transfer size.

Another approach to the problem would do "dumb" block device checksums (e.g. with MD5) and transfer differentiating blocks like in this answer here on ServerFault or in the blocksync.py script (I've linked the most recently active fork of it). It would not depend on snapshots at all, but obviously you would want to create one for the time of the copy to ensure that consistency of your data is maintained.

If you are concerned about your database's write performance with active snapshots, you also could take a look at ddsnap which contains several optimizations for snapshotting and volume replication, effectively working around your concerns.

Related Solutions

Linux – Copying a large directory tree locally? cp or rsync

I would use rsync as it means that if it is interrupted for any reason, then you can restart it easily with very little cost. And being rsync, it can even restart part way through a large file. As others mention, it can exclude files easily. The simplest way to preserve most things is to use the -a flag – ‘archive.’ So:

rsync -a source dest

Although UID/GID and symlinks are preserved by -a (see -lpgo), your question implies you might want a full copy of the filesystem information; and -a doesn't include hard-links, extended attributes, or ACLs (on Linux) or the above nor resource forks (on OS X.) Thus, for a robust copy of a filesystem, you'll need to include those flags:

rsync -aHAX source dest # Linux
rsync -aHE source dest  # OS X

The default cp will start again, though the -u flag will "copy only when the SOURCE file is newer than the destination file or when the destination file is missing". And the -a (archive) flag will be recursive, not recopy files if you have to restart and preserve permissions. So:

cp -au source dest

Git-style incremental backup with rsync

What you seem to be looking for is the --link-dest functionality that is part of rsync. What you seem to describe is exactly how dirvish operates.

The link-dest option creates hard-links from the destination path to another copy of the structure.

With dirvish you perform an initial backup, which just uses rsync.

After that each additional back, is hard-linked to the previous successful backup. Meaning there is no duplication of files. You can directly access any single backup from within the vault, and each backup is a complete and full backup. You can remove previous backups at any time.

Here is a script that you can use to demonstrate.

# create test area
mkdir -p /tmp/backuptest/{source,dest1,dest2,dest3}
for a in `seq 10` ; do dd if=/dev/urandom of=/tmp/backuptest/source/file$a bs=1M count=1; done

# look
find /tmp/backuptest/ -ls ; du find /tmp/backuptest/

# initial backup
rsync -va /tmp/backuptest/source/ /tmp/backuptest/dest1/

# look
find /tmp/backuptest/ -ls ; du find /tmp/backuptest/

# make chagnes
rm /tmp/backuptest/source/file[2-4]
cat /tmp/backuptest/source/file[6-7] >/tmp/backuptest/source/file11

# new backup linked to previous
rsync -va /tmp/backuptest/source/ /tmp/backuptest/dest2/ --link-dest=/tmp/backuptest/dest1/

# look
find /tmp/backuptest/ -ls ; du find /tmp/backuptest/

# make changes
rm /tmp/backuptest/source/file5
cat /tmp/backuptest/source/file[5-7] >/tmp/backuptest/source/file12

# new backup linked to previous
rsync -va /tmp/backuptest/source/ /tmp/backuptest/dest3/ --link-dest=/tmp/backuptest/dest2/

# look
find /tmp/backuptest/ -ls ; du find /tmp/backuptest/

# remove dest1
rm -r /tmp/backuptest/dest1/

# see your dest2, and dest3 are still complete backups for the state at those times.
find /tmp/backuptest/ -ls ; du find /tmp/backuptest/

Best Answer

Related Solutions

Linux – Copying a large directory tree locally? cp or rsync

Git-style incremental backup with rsync

Related Topic