If you can trust the filesystem last-modified timestamps, you can speed things up by combining Rsync with the UNIX/Linux 'find' utility. 'find' can assemble a list of all files that show last-modified times within the past day, and then pipe ONLY that shortened list of files/directories to Rsync. This is much faster than having Rsync compare the metadata of every single file on the sender against the remote server.
In short, the following command will execute Rsync ONLY on the list of files and directories that have changed in the last 24 hours: (Rsync will NOT bother to check any other files/directories.)
find /local/data/path/ -mindepth 1 -ctime -0 -print0 | xargs -0 -n 1 -I {} -- rsync -a {} remote.host:/remote/data/path/.
In case you're not familiar with the 'find' command, it recurses through a specific directory subtree, looking for files and/or directories that meet whatever criteria you specify. For example, this command:
find . -name '\.svn' -type d -ctime -0 -print
will start in the current directory (".") and recurse through all sub-directories, looking for:
- any directories ("-type d"),
- named ".svn" ("-name '.svn'"),
- with metadata modified in the last 24 hours ("-ctime -0").
It prints the full path name ("-print") of anything matching those criteria on the standard output. The options '-name ', '-type ', and '-ctime ' are called "tests", and the option '-print' is called an "action". The man page for 'find' has a complete list of tests and actions.
If you want to be really clever, you can use the 'find' command's '-cnewer ' test, instead of '-ctime ' to make this process more fault-tolerant and flexible. '-cnewer' tests whether each file/directory in the tree has had its metadata modified more recently than some reference file. Use 'touch' to create the NEXT run's reference file at the beginning of each run, right before the 'find... | rsync...' command executes. Here's the basic implementation:
#!/bin/sh
curr_ref_file=`ls /var/run/last_rsync_run.*`
next_ref_file="/var/run/last_rsync_run.$RANDOM"
touch $next_ref_file
find /local/data/path/ -mindepth 1 -cnewer $curr_ref_file -print0 | xargs -0 -n 1 -I {} -- rsync -a {} remote.host:/remote/data/path/.
rm -f $curr_ref_file
This script automatically knows when it was last run, and it only transfers files modified since the last run. While this is more complicated, it protects you against situations where you might have missed running the job for more than 24 hours, due to downtime or some other error.
Your interpretation is correct. If you have excluded files or directories from being transferred, --delete-excluded will remove them from the destination side (this does not have to be the "remote server", you can use rsync
to copy from a server to the local computer) if it finds them there. For instance, if you use --exclude=*.o --delete-excluded
, then if rsync finds any files ending in .o on the destination side, it will remove them whether they exist in the source directory or not.
Best Answer
What you want to do should be possible with the
--relative
(or-R
) option and a previous run offind
to generate a file list:Here you create a null-terminated list of files (only files, not directories) and feed this to rsync as the source for it's operation, informing it about the null-termination with
-0
. This is useful to avoid problems with spaces etc. in file names.from the rsync man page:
Use relative paths. This means that the full path names specified on the command line are sent to the server rather than just the last parts of the filenames. This is particularly useful when you want to send several different directories at the same time. For example, if you used this command:
... this would create a file named
baz.c
in/tmp/
on the remote machine. If instead you usedthen a file named
/tmp/foo/bar/baz.c
would be created on the remote machine -- the full path name is preserved.