I might be a bit late on this, but for the sake of future visitors I would describe the whole process of how to adapt the code that was previously run locally to be GoogleML-aware from the IO point of view.
- Python standard
open(file_name, mode)
does not work with buckets (gs://...../file_name
). One needs to from tensorflow.python.lib.io import file_io
and change all calls to open(file_name, mode)
to file_io.FileIO(file_name, mode=mode)
(note the named mode
parameter). The interface of the opened handle is the same.
- Keras and/or other libraries mostly use standard
open(file_name, mode)
internally. That said, something like trained_model.save(file_path)
calls to 3rd-party libraries will fail to store the result to the bucket. The only way to retrieve a model after the job has finished successfully would be to store it locally and then move to the bucket.
The code below is quite inefficient, because it loads the whole model at once and then dumps it to the bucket, but it worked for me for relatively small models:
model.save(file_path)
with file_io.FileIO(file_path, mode='rb') as if:
with file_io.FileIO(os.path.join(model_dir, file_path), mode='wb+') as of:
of.write(if.read())
The mode must be set to binary for both reading and writing.
When the file is relatively big, it makes sense to read and write it in chunks to decrease memory consumption.
- Before running a real task, I would advise to run a stub that simply saves a file to remote bucket.
This implementation, temporarily put instead of real train_model
call, should do:
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument(
'--job-dir',
help='GCS location with read/write access',
required=True
)
args = parser.parse_args()
arguments = args.__dict__
job_dir = arguments.pop('job_dir')
with file_io.FileIO(os.path.join(job_dir, "test.txt"), mode='wb+') as of:
of.write("Test passed.")
After a successful execution you should see the file test.txt
with a content "Test passed."
in your bucket.
Best Answer
A bit late but, for the sake of upcoming developers I'll try to solve the question.
The procedure is the same as in your local computer, with just two differences:
To download the model from Google Collab:
To upload the model to Google Collab:
You can check this notebook for more i/o options: https://colab.research.google.com/notebooks/io.ipynb
All the other steps are performed the same way yo do in your local computer. Hope this can help you.