Google-compute-engine not booting up properly, unable to SSH

google-compute-engine

My instance on Google Compute Engine is not booting up properly, which I am unable to SSH it anyways. I have a lot of stuff on the instance. How can I recover that?

Logs are as following. When I try if it is on network from Windows I get the nat IP but I am unable to SSH which was working fine. Neither can I SSH from the browser.

[    0.519999] md: autorun ...
[    0.520794] md: ... autorun DONE.
[    0.521761] VFS: Cannot open root device "sda1" or unknown-block(0,0): error -6
[    0.523744] Please append a correct "root=" boot option; here are the available partitions:
[    0.525886] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[    0.527829] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.19.0-25-generic #26~14.04.1-Ubuntu
[    0.529875] Hardware name: Google Google, BIOS Google 01/01/2011
[    1.656059] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

Best Answer

During the migration from trial to paid user, I lost my running instance with similar symptoms. However, and in my case, the "flag" auto-delete the disk when deleting the instance was checked which was preventing me from using the method described above. So here's how I was able to recover my drive:

First and foremost, do not delete your corrupted instance. You will need it.

From your main console,identify the name of the disk corresponding to the corrupted instance: "gcloud compute disks list"
Create a snapshot of the drive that seems corrupted: gcloud compute disks snapshot my-disk-1 --snapshot-names snapshot-1
Create and boot an instance from the newly created snapshot (make sure to turn off the auto-delete flag when creating the new instance). Chances are, the newly created instance will run into the exact same boot issue as the original one. That's okay this time, because you will now be able to shutdown and delete that instance without losing the drive which should now be available when listing with gcloud compute disks list (say: new_disk).
Once the instance has been deleted, you should be left with one new mountable drive. For that, create a 3rd instance with similar OS characteristics as the original one.
From the Google Cloud console, and using gcloud command, attach the drive to that new instance (say ubuntu-trusty-3). gcloud compute instance attach-disk ubuntu-trusty-3 --disk DISK --device-name new_disk You should now have 2 drives available on that instance.

$ sudo blkid /dev/sda1: LABEL="cloudimg-rootfs" UUID="87f65d22-c9a9-428c-b1ab-b4ad9f8e4c05" TYPE="ext4" /dev/sdb1: LABEL="cloudimg-rootfs" UUID="87f65d22-c9a9-428c-b1ab-b4ad9f8e4c05" TYPE="ext4"

Reboot that instance if the drive does not show up (sudo blkid).

Here's how it looked on my: dashboard

In my case, to my biggest surprise the kernel booted from the recovered drive (gmap-server) and I was back in business. I have no idea how the kernel picked this one versus the one created at the creation of the instance. If anyone knows, please chime in here.

Related Solutions

Ssh – Unable to connect Google Compute Engine instance via SSH in browser

It looks like you've added AllowUsers in /etc/ssh/sshd_config configuration file.

To resolve this issue, you'll need to attach the boot disk of your VM instance to a healthy instance as the second disk. Mount it, edit the configuration file and fix the issue.

Here are the steps you can take to resolve the issue:

First of all, take a snapshot of your instance’s disk, in case if a loss or corruption happens you can recover your disk.
In the Developers Console, click on your instance. Uncheck Delete boot disk when instance is deleted and then delete the instance. The boot disk will remain under “Disks”, and now you can attach the disk to another instance. You can also do this step using gcloud command:
```
$ gcloud compute instances delete NAME --keep-disks all
```
Now attach the disk to a healthy instance as an additional disk. You can do this through the Developers Console or using the gcloud command:
```
$ gcloud compute instances attach-disk EXAMPLE-INSTANCE --disk DISK --zone ZONE
```
SSH into your healthy instance.
Determine where the secondary disk lives:
```
$ ls -l /dev/disk/by-id/google-*
```

Mount the disk:

$ sudo mkdir /mnt/tmp
$ sudo mount /dev/disk/by-id/google-persistent-disk-1-part1 /mnt/tmp

Where google-persistent-disk-1 is the name of the disk

Edit sshd_config configuration file and remove AllowUsers line and save it.
```
$ sudo nano /mnt/tmp/etc/ssh/sshd_config
```
Now unmout the disk:
```
$ sudo umount /mnt/tmp
```
Detach it from the VM instance. This can be done through the Developers Console or using the command below:
```
$ gcloud compute instances detach-disk EXAMPLE-INSTANCE --disk DISK
```
Now create a new instance using your fixed boot disk.

Linux – Kernel Panic on CentOS – Google Compute Engine Instance

I've made a work around and got my instance running again. The basic problem is that, by default, linux instances on Google Cloud are set to zero timeout at GRUBs menu . So, you cannot access the menu, even through the serial console. I will describe the steps I made to restore my instance.

Create a snapshot of the machine startup disk.
Create a disk which the source is the snapshot created on the first step. Let's call it rescue-disk.
Launch a new Linux instance. May be the micro instance and you can delete it later. Call it rescue-instance.
Attach the rescue-disk to the rescue-instance.
From the rescue-instance mount the rescue-disk and change the <mount point>/etc/grub.conf as follow:

    root (hd0,0)
    kernel /boot/vmlinuz-2.6.32-642.11.1.el6.x86_64 ro root=UUID=23f78139-a1ac-4a7a-b608-05687cecfa37 selinux=0

De-attach rescue-disk from the rescue-instance and delete that instance if you want.
Launch a new instance which the source is the rescue-disk. You can do that in the disk.

If you already have another linux instance running on your gcloud you don't need to create a new instance, just use VM you have.

Best Answer

Related Solutions

Ssh – Unable to connect Google Compute Engine instance via SSH in browser

Linux – Kernel Panic on CentOS – Google Compute Engine Instance

Related Topic