Fix ‘Failed to Start Execute Cloud User/Final Scripts’ on Google Compute Instance

google-compute-engine

Question

How can I connect to a Google instance if I have no SSH keys and google user/final scripts will not run?

Problem Context

Google compute instance running Ubuntu 16.04 LTS.

I changed its base version of python in order to install a python package that was very finicky with python's 'setup-tools'.

My theory is that this change broke the start up of google cloud scripts, which was the only way I had in past connected to the instance.

Problem Details

We can verify the server is on, as it is serving bad nginx networks here: http://35.201.199.224/

I have a server log I can share here if anyone is willing to help me read it here: https://pastebin.com/DF5wsLhH

The part I think is the most important is this snippet

Mar 21 17:21:50 instance-1 systemd[1]: Started Google Compute Engine Network Daemon.
Mar 21 17:21:51 instance-1 google_network_daemon[1340]: Traceback (most recent call last):
Mar 21 17:21:51 instance-1 google_network_daemon[1340]:   File "/usr/bin/google_network_daemon", line 9, in <module>
Mar 21 17:21:51 instance-1 google_network_daemon[1340]:     load_entry_point('google-compute-engine==2.8.4', 'console_scripts', 'google_network_daemon')()
Mar 21 17:21:51 instance-1 google_network_daemon[1340]:   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 542, in load_entry_point
Mar 21 17:21:51 instance-1 google_network_daemon[1340]:     return get_distribution(dist).load_entry_point(group, name)
Mar 21 17:21:51 instance-1 google_network_daemon[1340]:   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2569, in load_entry_point
Mar 21 17:21:51 instance-1 google_network_daemon[1340]:     return ep.load()
Mar 21 17:21:51 instance-1 google_network_daemon[1340]:   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2229, in load
Mar 21 17:21:51 instance-1 google_network_daemon[1340]:     return self.resolve()
Mar 21 17:21:51 instance-1 google_network_daemon[1340]:   File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2235, in resolve
Mar 21 17:21:51 instance-1 google_network_daemon[1340]:     module = __import__(self.module_name, fromlist=['__name__'], level=0)
Mar 21 17:21:51 instance-1 google_network_daemon[1340]:   File "/usr/lib/python3/dist-packages/google_compute_engine/networking/network_daemon.py", line 26, in <module>
Mar 21 17:21:51 instance-1 google_network_daemon[1340]:     from google_compute_engine import config_manager
Mar 21 17:21:51 instance-1 google_network_daemon[1340]:   File "/usr/lib/python3/dist-packages/google_compute_engine/config_manager.py", line 23, in <module>
Mar 21 17:21:51 instance-1 google_network_daemon[1340]:     from google_compute_engine.compat import parser
Mar 21 17:21:51 instance-1 google_network_daemon[1340]:   File "/usr/lib/python3/dist-packages/google_compute_engine/compat.py", line 23, in <module>
Mar 21 17:21:51 instance-1 google_network_daemon[1340]:     import distro
Mar 21 17:21:51 instance-1 google_network_daemon[1340]: ModuleNotFoundError: No module named 'distro'

I think it cannot find this module as it is looking in the wrong python version.

Conclusion

I do understand that I was inexcusably lazy connecting only by using Googles 'Connect By SSH button' but I assumed naively that the arguably best commercial developers in the western world would have everything under control for me.
The oh so tempting ssh button

Best Answer

You may have to do the following:

  1. shut down this VM (without deleting the disk!)
  2. create a new VM
  3. mount this disk in read/write mode
  4. repair the Python installation to allow the scripts to run to enable your SSH session

If this VM is serving live traffic that you cannot lose, consider the following:

  1. create a snapshot from this VM's persistent disk
  2. create a new GCE VM instance from the snapshot
  3. fix it such that it runs and is able to serve requests
  4. redirect traffic to this serving instance
  5. repair the original disk

Additionally, consider using containers to run specific versions of dependencies that require different base tools (such as Python) that may otherwise conflict with system tools or external processes (such as the SSH before/after scripts, etc.)