Centos 5.5 won’t boot anymore; the dreaded “Could not find filesystem /dev/root” error

Normally I have no problems finding my answers online but I have been trying to fix my downed server for 3 days now and I am no step further.

The server in question is a Centos 5.5 machine running an openvz-kernel, 2.6.18-194.8.1.el5.028stab070.5 to be more precicse. We don't have physical access to the server, neither do we have serial/net console of some sorts. We do have a debian-based rescue-disk loaded over pxe.

The server has two disks, SDA:

Disk /dev/sda: 320.0 GB, 320071851520 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *         524        1193     5381775   83  Linux
/dev/sda2               1         523     4200997   82  Linux swap / Solaris
/dev/sda3            1194       38913   302985900   83  Linux

Partition table entries are not in disk order

and SDB:

Disk /dev/sdb: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1       38913   312568641   83  Linux

sda1 has the filesystem, sda2 is the swap space, sda3 and sdb1 together form md0 which is mounted on /vz. Here is the fstab.

/dev/sda1               /                       ext3    defaults        1 1
proc                    /proc                   proc    defaults        0 0
sysfs                   /sys                    sysfs   defaults        0 0
/dev/sda2               swap                    swap    defaults        0 0
/dev/md0                /vz                     ext3    defaults        0 0

and menu.lst/grub.conf

boot=/dev/sda
default=0
timeout=15
#splashimage=(hd0,0)/boot/grub/splash.xpm.gz

title CentOS (2.6.18-194.8.1.el5.028stab070.5)
        root (hd0,0)
        kernel /boot/vmlinuz-2.6.18-194.8.1.el5.028stab070.5 ro root=/dev/sda1
        initrd /boot/initrd-2.6.18-194.8.1.el5.028stab070.5.img
title CentOS (default)
        root (hd0,0)
        kernel /boot/bzImage ro root=/dev/sda1

Now for some reason the server went down and we have been unable to get it in working order again. Our provider tried to "fix" our problem by reinstalling grub via the earlier mentioned rescue-disk while mounting sda1 @ /mnt.

grub-install --root-directory=/mnt '(hd0)'

http://i.stack.imgur.com/1FRi6.jpg

This is photo of the monitor output of the server when booted. Now it can clearly load /etc/fstab since it tries to resume from our swap partition.

Now because we don't have access to the server we are using Qemu to try and debug the problem but we can never be sure it's behaves the same way, plus everytime we make changes on sda1 we need to reboot the server for qemu to see the changes, any pointers to do this an other way are welcome. We use qemu the following way.

qemu-system-x86_64 -vnc :0 -hda /dev/sda -hdb /dev/sdb

We've tried to reinstall the latest kernel. Tried to make a new initrd without raid-support and forced ext3 support. Copying the /boot from a server with the same setup. Installing grub on almost all possible disks / partitions.

I have no clue what else I can try. Any tips at all are greatly appreciated.

tl;dr: Server went down, won't reboot because of "could not find filesystem error"

Best Answer

What does your grub.conf look like? Maybe there's suddenly a wrongly configured root= or init= parameter?

If that's not the case, is there a /dev directory present when you mount the CentOS / partition via your Debian-based rescue system? If there is, what does it look like? Does it match with /dev/sda1 information if you check it out with stat or ls -lah? Maybe /dev/root is pointing there to wrong location (or, has been created with wrong mknod parameters).

Yes, /dev should be quite dynamic these days, but you never know... back in kernel 2.4 days I hosed one of my Gentoo systems and got it booting after I recreated some device nodes by hand.

Edit after grub.conf was added to original question:

The very first line (boot=/dev/sda) of your grub.conf looks suspicious. Try to comment it out and reboot.

If this did not work, you might also want to check out /dev/sda1 parameters with ls -lah. It should look like this:

brw-r----- 1 root disk 104, 1 Sep 29 01:23 /dev/sda1

Observe the values similar to 104, 1. Then create the /dev/root with

mknod -m 640 /dev/root b 104 1

where you change 104 and 1 values according to yours.

Hope this helps!

Best Answer

Related Solutions

Linux – how can I boot linux from a software raid 1 array

Grub crc error after decompressing linux

Related Topic