S3cmd with –delete-removed

amazon s3s3cmd

I'm currently writing a script to sync files in s3 buckets with s3cmd.

I check the document and it says:

s3cmd sync LOCAL_DIR s3://BUCKET[/PREFIX] or s3://BUCKET[/PREFIX] LOCAL_DIR

also I find a nice option:

--delete-removed
         Delete remote objects with no corresponding local file [sync]

I tested on the first form of s3cmd sync with –delete-removed:

s3cmd sync -r --delete-removed LOCAL_DIR s3://BUCKET[/PREFIX]

It works like a charm that s3 bucket will delete any files not in my LOCAL_DIR

However when I try the second form:

s3cmd sync -r --delete-removed s3://BUCKET[/PREFIX] LOCAL_DIR

The s3cmd seems first to delete all my files under LOCAL_DIR and then download files from s3 bucket to my LOCAL_DIR

It is apparently a waste of time, so is there another better way to sync without deleting all my local files first. That is, copy all files from s3 bucket to my local dir exactly

Best Answer

Take care with your trailing slash (or lack of slash) in path names. It makes a difference.

http://s3tools.org/s3cmd-sync

Important — in both cases just the last part of the path name is taken into account. In the case of dir1 without trailing slash (which would be the same as, say, ~/demo/dir1 in our case) the last part of the path is dir1 and that’s what’s used on the remote side, appended after s3://s3…/path/ to make s3://s3…/path/dir1/….

On the other hand in the case of dir1/ (note the trailing slash), which would be the same as ~/demo/dir1/ (trailing slash again) is actually similar to saying dir1/* – ie expand to the list of the files in dir1. In that case the last part(s) of the path name are the filenames (file1-1.txt and file1-2.txt) without the dir1/ directory name. So the final S3 paths are s3://s3…/path/file1-1.txt and s3://s3…/path/file1-2.txt respectively, both without the dir1/ member in them. I hope it’s clear enough, if not ask in the mailing list or send me a better wording ;-)