Cron – rsync changing already existing files

backupcronrsync

I have a simple line of rsync in my crontab that gets backup files from the prod server to another.

It looks like it is touching the already existing files in the destination folder. This way, the backup would incrementally take longer each interval.

Please take a look at the date and time the files below have changed.

How do I use rsync not to touch (and download?) the files it already has. I don't need any checksums calculated either, once the backups are created, they won't change anymore.

rsync -vzre 'ssh' stor@server:/backup/system/ /storage/share/Backup/Server

The files to be fetched:

-rw-r-x--- 1 root stor 896K Jun 22 05:02 giant-140622-etc.zip
-rw-r-x--- 1 root stor 620K Jun 22 05:02 giant-140622-sql.zip
-rw-r-x--- 1 root stor  84M Jun 22 05:02 giant-140622-www.zip
-rw-r-x--- 1 root stor 899K Jun 25 05:00 giant-140625-etc.zip
-rw-r-x--- 1 root stor 603K Jun 25 05:00 giant-140625-sql.zip
-rw-r-x--- 1 root stor  84M Jun 25 05:00 giant-140625-www.zip
-rw-r-x--- 1 root stor 899K Jun 28 05:00 giant-140628-etc.zip
-rw-r-x--- 1 root stor 620K Jun 28 05:00 giant-140628-sql.zip
-rw-r-x--- 1 root stor  86M Jun 28 05:00 giant-140628-www.zip
-rw-r-x--- 1 root stor 899K Jun 30 05:00 giant-140630-etc.zip
-rw-r-x--- 1 root stor 617K Jun 30 05:00 giant-140630-sql.zip
-rw-r-x--- 1 root stor  86M Jun 30 05:00 giant-140630-www.zip

The destination:

-rw-r-x--- 1 stor stor 896K Jun 30 06:06 giant-140622-etc.zip
-rw-r-x--- 1 stor stor 620K Jun 30 06:06 giant-140622-sql.zip
-rw-r-x--- 1 stor stor  84M Jun 30 06:06 giant-140622-www.zip
-rw-r-x--- 1 stor stor 899K Jun 30 06:06 giant-140625-etc.zip
-rw-r-x--- 1 stor stor 603K Jun 30 06:06 giant-140625-sql.zip
-rw-r-x--- 1 stor stor  84M Jun 30 06:06 giant-140625-www.zip
-rw-r-x--- 1 stor stor 899K Jun 30 06:06 giant-140628-etc.zip
-rw-r-x--- 1 stor stor 620K Jun 30 06:06 giant-140628-sql.zip
-rw-r-x--- 1 stor stor  86M Jun 30 06:06 giant-140628-www.zip
-rw-r-x--- 1 stor stor 899K Jun 30 06:07 giant-140630-etc.zip
-rw-r-x--- 1 stor stor 617K Jun 30 06:08 giant-140630-sql.zip
-rw-r-x--- 1 stor stor  86M Jun 30 06:10 giant-140630-www.zip

Update:

When I run the rsync command (with the --skip-existing arg) from the shell, it only downloads non-existing new files and skips the files it already has.

When investigating the behaviour of the exact same command run by a cronjob, the already existing files do change every cycle and the whole job takes incrementally longer each cycle (compare the times above, cronjob starting at 06:00, 2 minutes per file even if they already exist).

rsync -vzr --ignore-existing -e 'ssh -i /path/id_rsa -l backup' backup@flowl.info:/backup/system/ /nfs/share-private/Backup/Server

Update:

Here are the files form july, I put an extra blank line into, please see the times, which started by 06:01 and raise each new files.

-rw-r-x--- 1 stor stor 899K Jul  4 06:01 giant-140702-etc.zip
-rw-r-x--- 1 stor stor 621K Jul  4 06:01 giant-140702-sql.zip
-rw-r-x--- 1 stor stor  86M Jul  4 06:03 giant-140702-www.zip
                                       ^-- 01 to 03
-rw-r-x--- 1 stor stor 899K Jul  4 06:04 giant-140704-etc.zip
-rw-r-x--- 1 stor stor 634K Jul  4 06:05 giant-140704-sql.zip
-rw-r-x--- 1 stor stor  86M Jul  8 06:02 giant-140704-www.zip
                                       ^-- ???
-rw-r-x--- 1 stor stor 899K Jul  8 06:03 giant-140706-etc.zip
-rw-r-x--- 1 stor stor 629K Jul  8 06:03 giant-140706-sql.zip
-rw-r-x--- 1 stor stor  86M Jul  8 06:06 giant-140706-www.zip
                                       ^-- 03 - 06
-rw-r-x--- 1 stor stor 899K Jul  8 06:07 giant-140708-etc.zip
-rw-r-x--- 1 stor stor 629K Jul  8 06:07 giant-140708-sql.zip
-rw-r-x--- 1 stor stor  86M Jul  8 06:10 giant-140708-www.zip
                                       ^-- 07 - 10

Now when I imagine going on another month, the time would be like:

-rw-r-x--- 1 stor stor 899K Jul  8 06:32 giant-140808-etc.zip
-rw-r-x--- 1 stor stor 629K Jul  8 06:32 giant-140808-sql.zip
-rw-r-x--- 1 stor stor  86M Jul  8 06:35 giant-140808-www.zip
                                       ^-- what I imagine to happen

Best Answer

By default rsync will read the entire file on both source and destination, to verify that they are identical. This does not consume network bandwidth, as it will only be comparing a hash value. But it does spend time reading from the disk.

In one usage scenario, I found this to be terribly inefficient because the source files were only being appended to. I used the --size-only, which worked well for me.

There is a few other options, which look like they may be applicable, --append and --append-verify, but I haven't tested those myself.

It does not look like you have a directory with a lot of small files, so the time to read the directory listing from disk and stat each file, shouldn't be much of a problem.