Copy list of files (-I flag) with gsutil preserving path

copyingfindgcloudgoogle-cloud-storagegsutil

I am trying to copy all pictures and static files to a bucket of mine in Google Cloud Platform.

I am attempting this command from the root dir of my app:

find -regextype posix-extended -iregex ".*\.(js|css|png|jpg|gif|ttf|cur|woff|eot)" | gsutil -m cp -I gs://example-bucket/

And my files are in folders like this for example:

./pictures/bg/img.png
./pictures/pictures/dog.jpg
./fonts/modern.woff

The flag -I in the gsutil command tells it to load the list of files from stdin, the flag -m just makes a multi-thread upload.

This all works fine, I see my files in the bucket, however, all files lose their original paths and get sent to the root of the bucket, like this:

gs://example-bucket/img.png
gs://example-bucket/dog.jpg
gs://example-bucket/modern.woff

The wanted result is this:

gs://example-bucket/pictures/bg/img.png
gs://example-bucket/pictures/pictures/dog.jpg
gs://example-bucket/fonts/modern.woff

I would like the files to preserve their original paths.

I also tried this and I get the same result:

gsutil -m cp -r ./**/*.{js,css,png,jpg,gif,ttf,cur,woff,eot} gs://example-bucket/

The only thing that seems to be working is to make a for loop

for ..get-files..
begin
    gsutil cp $i gs://example-bucket/$i
end

And also

find ..find-expr.. -exec gsutil cp {} gs://example-bucket/{}

But both of those are too slow for my workflow.

Thanks in advance for your help.

Best Answer

Either approach (enumerating the files using find or using a gsutil recursive (**) wildcard) produces a list of path names for the source of the copy, and gsutil will always 'flatten' the paths when you run it this way. gsutil works this way because we wanted it to work similarly to the older Unix/Linux cp command (which would likewise flatten the paths when you specify this way, all being copied into a single destination directory).

To avoid having the paths flattenend you would need to generate a script that provides the full paths for each object:

gsutil cp pictures/bg/img.png gs://example-bucket/pictures/bg/img.png
gsutil cp pictures/pictures/dog.jpg gs://example-bucket/pictures/pictures/dog.jpg
...

To get parallelism you could run each command in the background:

gsutil cp pictures/bg/img.png gs://example-bucket/pictures/bg/img.png &
gsutil cp pictures/pictures/dog.jpg gs://example-bucket/pictures/pictures/dog.jpg &
...
wait

If you're copying a large number of files you probably need to limit parallelism to avoid overloading your machine (do N and then wait, do the next N and then wait, etc.)