Ubuntu – How to recreate a working AMI from recovery snapshot after Aug 8 outage

amazon ec2amazon-amiamazon-web-servicesUbuntu

After Amazon's Aug 8 outage, all (EBS based) AMIs stopped working for many users. This is due to corruption of some sectors in snapshots that the AMIs are based on.

However, Amazon created recovery snapshots where the disk problems should be fixed. Those are named along the lines of "Recovery snapshot for vol-xxxxxxxx".

I created a new AMI from recovery snapshot which worked fine, but instances launched from this new AMI do not work: their state is "Running", but I cannot ssh into the machine nor access any of the web services that should be running there. It boils down to this (from System Log, accessible through AWS management console):

EXT3-fs: sda1: couldn't mount because of unsupported optional features (240).

EXT2-fs: sda1: couldn't mount because of unsupported optional features (244).

Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,1)

I've mounted a volume created from that recovery snapshot in a another server on AWS, and everything looks quite normal though. For example, fsck says:

$ sudo fsck -a /dev/xvdg
fsck from util-linux-ng 2.17.2
uec-rootfs: clean, 53781/524288 files, 546065/2097152 blocks

In one of the AWS forum discussions, I found this advice from someone with similar problems:

A work around will be to make a volume from the snapshot and attach
it to a running instance, use fsck –force to force the checking of
the filesystem and once cleared, you can make a snapshot and use it
for the AMI.

But I don't know how to force fsck on Ubuntu (11.04):

$ sudo fsck --force /dev/xvdg
fsck from util-linux-ng 2.17.2
fsck.ext3: invalid option -- 'o'

Anyone know how to force file system check on the volume on Ubuntu? Any other ideas on how to launch working instances that are based on the recovery snapshot?

Right now it looks like it might be quicker to just start over from a clean Ubuntu AMI and re-setup all our services. 🙁 But of course I would prefer not to do that if there's any way to get the recovery snapshot to actually work.

Best Answer

I ran into the same problem when trying to duplicate a machine.

The problem turned out to be the kernel. Both when creating the AMI and the instance I selected default for the kernel image.

To resolve the problem, I recreated the AMI using the same kernel image as the original instance.