I am trying to copy all pictures and static files to a bucket of mine in Google Cloud Platform.
I am attempting this command from the root dir of my app:
find -regextype posix-extended -iregex ".*\.(js|css|png|jpg|gif|ttf|cur|woff|eot)" | gsutil -m cp -I gs://example-bucket/
And my files are in folders like this for example:
./pictures/bg/img.png
./pictures/pictures/dog.jpg
./fonts/modern.woff
The flag -I
in the gsutil command tells it to load the list of files from stdin, the flag -m
just makes a multi-thread upload.
This all works fine, I see my files in the bucket, however, all files lose their original paths and get sent to the root of the bucket, like this:
gs://example-bucket/img.png
gs://example-bucket/dog.jpg
gs://example-bucket/modern.woff
The wanted result is this:
gs://example-bucket/pictures/bg/img.png
gs://example-bucket/pictures/pictures/dog.jpg
gs://example-bucket/fonts/modern.woff
I would like the files to preserve their original paths.
I also tried this and I get the same result:
gsutil -m cp -r ./**/*.{js,css,png,jpg,gif,ttf,cur,woff,eot} gs://example-bucket/
The only thing that seems to be working is to make a for loop
for ..get-files..
begin
gsutil cp $i gs://example-bucket/$i
end
And also
find ..find-expr.. -exec gsutil cp {} gs://example-bucket/{}
But both of those are too slow for my workflow.
Thanks in advance for your help.
Best Answer
Either approach (enumerating the files using
find
or using a gsutil recursive (**) wildcard) produces a list of path names for the source of the copy, and gsutil will always 'flatten' the paths when you run it this way. gsutil works this way because we wanted it to work similarly to the older Unix/Linux cp command (which would likewise flatten the paths when you specify this way, all being copied into a single destination directory).To avoid having the paths flattenend you would need to generate a script that provides the full paths for each object:
To get parallelism you could run each command in the background:
If you're copying a large number of files you probably need to limit parallelism to avoid overloading your machine (do N and then wait, do the next N and then wait, etc.)