Copy list of files (-I flag) with gsutil preserving path

copyingfindgcloudgoogle-cloud-storagegsutil

I am trying to copy all pictures and static files to a bucket of mine in Google Cloud Platform.

I am attempting this command from the root dir of my app:

find -regextype posix-extended -iregex ".*\.(js|css|png|jpg|gif|ttf|cur|woff|eot)" | gsutil -m cp -I gs://example-bucket/

And my files are in folders like this for example:

./pictures/bg/img.png
./pictures/pictures/dog.jpg
./fonts/modern.woff

The flag -I in the gsutil command tells it to load the list of files from stdin, the flag -m just makes a multi-thread upload.

This all works fine, I see my files in the bucket, however, all files lose their original paths and get sent to the root of the bucket, like this:

gs://example-bucket/img.png
gs://example-bucket/dog.jpg
gs://example-bucket/modern.woff

The wanted result is this:

gs://example-bucket/pictures/bg/img.png
gs://example-bucket/pictures/pictures/dog.jpg
gs://example-bucket/fonts/modern.woff

I would like the files to preserve their original paths.

I also tried this and I get the same result:

gsutil -m cp -r ./**/*.{js,css,png,jpg,gif,ttf,cur,woff,eot} gs://example-bucket/

The only thing that seems to be working is to make a for loop

for ..get-files..
begin
    gsutil cp $i gs://example-bucket/$i
end

And also

find ..find-expr.. -exec gsutil cp {} gs://example-bucket/{}

But both of those are too slow for my workflow.

Thanks in advance for your help.

Best Answer

Either approach (enumerating the files using find or using a gsutil recursive (**) wildcard) produces a list of path names for the source of the copy, and gsutil will always 'flatten' the paths when you run it this way. gsutil works this way because we wanted it to work similarly to the older Unix/Linux cp command (which would likewise flatten the paths when you specify this way, all being copied into a single destination directory).

To avoid having the paths flattenend you would need to generate a script that provides the full paths for each object:

gsutil cp pictures/bg/img.png gs://example-bucket/pictures/bg/img.png
gsutil cp pictures/pictures/dog.jpg gs://example-bucket/pictures/pictures/dog.jpg
...

To get parallelism you could run each command in the background:

gsutil cp pictures/bg/img.png gs://example-bucket/pictures/bg/img.png &
gsutil cp pictures/pictures/dog.jpg gs://example-bucket/pictures/pictures/dog.jpg &
...
wait

If you're copying a large number of files you probably need to limit parallelism to avoid overloading your machine (do N and then wait, do the next N and then wait, etc.)

Related Solutions

Using JSON keys with google cloud gsutil

The short version is to run the following command and follow instructions:

gsutil config -e

The gsutil tool has built-in help which can be consulted for all kinds of options and modes of operation. When running gsutil help creds, one of the help options recommended when running gsutil alone, we can read the section on "OAuth2 Service Account" to see the instructions for using a service account's json key file:

OAuth2 Service Account:

This is the preferred type of credential to use when authenticating on
behalf of a service or application (as opposed to a user). For example, if
you will run gsutil out of a nightly cron job to upload/download data,
using a service account allows the cron job not to depend on credentials of
an individual employee at your company. This is the type of credential that
will be configured when you run "gsutil config -e".

It is important to note that a service account is considered an Editor by
default for the purposes of API access, rather than an Owner. In particular,
the fact that Editors have OWNER access in the default object and
bucket ACLs, but the canned ACL options remove OWNER access from
Editors, can lead to unexpected results. The solution to this problem is to
ensure the service account is an Owner in the Permissions tab for your
project. To find the email address of your service account, visit the
`Google Developers Console <https://cloud.google.com/console#/project>`_,
click on the project you're using, click "APIs & auth", and click
"Credentials".

To create a service account, visit the Google Developers Console and then:

   - Click "APIs & auth" in the left sidebar.

   - Click "Credentials".

   - Click "Create New Client ID".

   - Select "Service Account" as your application type.

   - Save the JSON private key or the .p12 private key and password
     provided.

For further information about account roles, see:
  https://developers.google.com/console/help/#DifferentRoles

For more details about OAuth2 service accounts, see:
  https://developers.google.com/accounts/docs/OAuth2ServiceAccount

Gcloud: Copy files between two VM instances

You can set the service account scopes for Google Cloud APIs when creating your GCE instance. By doing so your service account on this GCE instance will have the authorization to make API calls on your behalf. You can refer to this link for more information.

Best Answer

Related Solutions

Using JSON keys with google cloud gsutil

Gcloud: Copy files between two VM instances

Related Topic