RSYNC to windows CIFS copies all directories on update (but not old content)

cifsrsync

I have mounted a remote windows share (that will be where my backups to tape will be archiving).

I have 70GB of data that doesn't change that much, so I want to use RSYNC to mirror the data.

/usr/bin/rsync -rlptDv -e ssh --delete \
      --exclude "*Locks" --exclude "tmp" --bwlimit=0 \
      --modify-window=1 /cvs1/* localhost:/mnt/DUBBU01/Linux/Buzz/cvs1/

Now this works fine, in that no files are being updated. To be honest, the folder permissions done mean a damn, as these can be reset if i ever did have to restore from backup.

HOWEVER every single folder gets copied. Not their contents, just the folders. Is there a way to exclude folders containing data, but not the data itself?

The vast number of options in rsync is proving a pain to test this. And with about a million files, and a couple of hundred thousand directories, the ile build can take some time……

Best Answer

What you want to do should be possible with the --relative (or -R) option and a previous run of find to generate a file list:

find /cvs1 -type f -not \( -name *Locks -o -name tmp \) -print0 > filelist
rsync -pR --modify-window=1 -0 \
      --files-from=filelist /mnt/DUBBU01/Linux/Buzz/

Here you create a null-terminated list of files (only files, not directories) and feed this to rsync as the source for it's operation, informing it about the null-termination with -0. This is useful to avoid problems with spaces etc. in file names.

from the rsync man page:

   -R, --relative

Use relative paths. This means that the full path names specified on the command line are sent to the server rather than just the last parts of the filenames. This is particularly useful when you want to send several different directories at the same time. For example, if you used this command:

rsync -av /foo/bar/baz.c remote:/tmp/

... this would create a file named baz.c in /tmp/ on the remote machine. If instead you used

rsync -avR /foo/bar/baz.c remote:/tmp/

then a file named /tmp/foo/bar/baz.c would be created on the remote machine -- the full path name is preserved.

Related Solutions

Synchronizing very large folder structures

If you can trust the filesystem last-modified timestamps, you can speed things up by combining Rsync with the UNIX/Linux 'find' utility. 'find' can assemble a list of all files that show last-modified times within the past day, and then pipe ONLY that shortened list of files/directories to Rsync. This is much faster than having Rsync compare the metadata of every single file on the sender against the remote server.

In short, the following command will execute Rsync ONLY on the list of files and directories that have changed in the last 24 hours: (Rsync will NOT bother to check any other files/directories.)

find /local/data/path/ -mindepth 1 -ctime -0 -print0 | xargs -0 -n 1 -I {} -- rsync -a {} remote.host:/remote/data/path/.

In case you're not familiar with the 'find' command, it recurses through a specific directory subtree, looking for files and/or directories that meet whatever criteria you specify. For example, this command:

find . -name '\.svn' -type d -ctime -0 -print

will start in the current directory (".") and recurse through all sub-directories, looking for:

any directories ("-type d"),
named ".svn" ("-name '.svn'"),
with metadata modified in the last 24 hours ("-ctime -0").

It prints the full path name ("-print") of anything matching those criteria on the standard output. The options '-name ', '-type ', and '-ctime ' are called "tests", and the option '-print' is called an "action". The man page for 'find' has a complete list of tests and actions.

If you want to be really clever, you can use the 'find' command's '-cnewer ' test, instead of '-ctime ' to make this process more fault-tolerant and flexible. '-cnewer' tests whether each file/directory in the tree has had its metadata modified more recently than some reference file. Use 'touch' to create the NEXT run's reference file at the beginning of each run, right before the 'find... | rsync...' command executes. Here's the basic implementation:

#!/bin/sh
curr_ref_file=`ls /var/run/last_rsync_run.*`
next_ref_file="/var/run/last_rsync_run.$RANDOM"
touch $next_ref_file
find /local/data/path/ -mindepth 1 -cnewer $curr_ref_file -print0 | xargs -0 -n 1 -I {} -- rsync -a {} remote.host:/remote/data/path/.
rm -f $curr_ref_file

This script automatically knows when it was last run, and it only transfers files modified since the last run. While this is more complicated, it protects you against situations where you might have missed running the job for more than 24 hours, due to downtime or some other error.

Debian – What exactly will –delete-excluded do for rsync

Your interpretation is correct. If you have excluded files or directories from being transferred, --delete-excluded will remove them from the destination side (this does not have to be the "remote server", you can use rsync to copy from a server to the local computer) if it finds them there. For instance, if you use --exclude=*.o --delete-excluded, then if rsync finds any files ending in .o on the destination side, it will remove them whether they exist in the source directory or not.

Best Answer

Related Solutions

Synchronizing very large folder structures

Debian – What exactly will –delete-excluded do for rsync

Related Topic