Synology NAS – rsync messing up versioning / deduplication

backuprsyncsynology

Is it true that Synology DSM 4.3's default rsync implementation is not able to handle "vast" amounts of data and could mess up versioning / deduplication? Could it be that any of the variables (see detailed info below) could make this so much more difficult?

Edit: I'm looking for nothing more then an answer if the above claims are non-sense or could be true.

Detailed info:

At work, we've got an Synology NAS running at the office. This NAS is used by a few designers where they directly work from. They have projects running which consist of high resolution stock photos, large PSD's, PDF's and what not. We have a folder which is approx. 430GB in size which only consists of the currently running projects. This folder is supposed to be backupped in a datacenter, weekly through our internet connection.

All of our IT is being handled by a third party, which claims that our backup is beginning to form a certain size ("100GB+") where the default implementation of the DSM (4.3) rsync is unable to handle the vast amount of data to the online backup (on one of their machines in their datacenter). They say the backup consists about 10TB of data because rsync has problems with "versioning / de-duplication" (retention: 30 days) and goes haywire.

Because of this, they suggest using a "professional online backup service", which cranks up our costs per GB to the online backup significantly.

Best Answer

Rsync in and of itself doesn't choke on large file sizes or "too many" files. Depending on your situation, it could be (but is unlikely) that the rsync job each week is taking more than 1 week to complete, causing a new rsync job to begin before the previous rsync job finished.

It is common knowledge among IT folks that transferring tons of little files takes a whole lot more time than transferring a few very large files with all else equals (same internet speed, same amount of data, etc... Take a look at this ("Transferring millions of images") as an example discussion on Stack Overflow, as well as this ("Which is faster, and why: transferring several small files or few large files?") as an example discussion here on Serverfault.

So the issue may be that you should compress the files/folders before running rsync, and then copying the compressed file to your off-site data center. That would save you in off-site data storage costs anyway, although it does open up another can of worms.

Your first step would be, of course, to figure out how long it takes the rsync job to run. Then figure out if you need to change your backup methodology by either compressing the data beforehand or moving to an alternative backup solution.

By the way, as of this posting, Synology DSM 5.1 is the latest version, and 5.2 is in beta. You should update to DSM 5.1 if you haven't already. This would surely not hurt your situation.

Related Topic