Normally I have no problems finding my answers online but I have been trying to fix my downed server for 3 days now and I am no step further.
The server in question is a Centos 5.5 machine running an openvz-kernel, 2.6.18-194.8.1.el5.028stab070.5 to be more precicse. We don't have physical access to the server, neither do we have serial/net console of some sorts. We do have a debian-based rescue-disk loaded over pxe.
The server has two disks, SDA:
Disk /dev/sda: 320.0 GB, 320071851520 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000
Device Boot Start End Blocks Id System
/dev/sda1 * 524 1193 5381775 83 Linux
/dev/sda2 1 523 4200997 82 Linux swap / Solaris
/dev/sda3 1194 38913 302985900 83 Linux
Partition table entries are not in disk order
and SDB:
Disk /dev/sdb: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000
Device Boot Start End Blocks Id System
/dev/sdb1 1 38913 312568641 83 Linux
sda1 has the filesystem, sda2 is the swap space, sda3 and sdb1 together form md0 which is mounted on /vz. Here is the fstab.
/dev/sda1 / ext3 defaults 1 1
proc /proc proc defaults 0 0
sysfs /sys sysfs defaults 0 0
/dev/sda2 swap swap defaults 0 0
/dev/md0 /vz ext3 defaults 0 0
and menu.lst/grub.conf
boot=/dev/sda
default=0
timeout=15
#splashimage=(hd0,0)/boot/grub/splash.xpm.gz
title CentOS (2.6.18-194.8.1.el5.028stab070.5)
root (hd0,0)
kernel /boot/vmlinuz-2.6.18-194.8.1.el5.028stab070.5 ro root=/dev/sda1
initrd /boot/initrd-2.6.18-194.8.1.el5.028stab070.5.img
title CentOS (default)
root (hd0,0)
kernel /boot/bzImage ro root=/dev/sda1
Now for some reason the server went down and we have been unable to get it in working order again. Our provider tried to "fix" our problem by reinstalling grub via the earlier mentioned rescue-disk while mounting sda1 @ /mnt.
grub-install --root-directory=/mnt '(hd0)'
http://i.stack.imgur.com/1FRi6.jpg
This is photo of the monitor output of the server when booted. Now it can clearly load /etc/fstab since it tries to resume from our swap partition.
Now because we don't have access to the server we are using Qemu to try and debug the problem but we can never be sure it's behaves the same way, plus everytime we make changes on sda1 we need to reboot the server for qemu to see the changes, any pointers to do this an other way are welcome. We use qemu the following way.
qemu-system-x86_64 -vnc :0 -hda /dev/sda -hdb /dev/sdb
We've tried to reinstall the latest kernel. Tried to make a new initrd without raid-support and forced ext3 support. Copying the /boot from a server with the same setup. Installing grub on almost all possible disks / partitions.
I have no clue what else I can try. Any tips at all are greatly appreciated.
tl;dr: Server went down, won't reboot because of "could not find filesystem error"
Best Answer
What does your grub.conf look like? Maybe there's suddenly a wrongly configured root= or init= parameter?
If that's not the case, is there a /dev directory present when you mount the CentOS / partition via your Debian-based rescue system? If there is, what does it look like? Does it match with /dev/sda1 information if you check it out with
stat
orls -lah
? Maybe /dev/root is pointing there to wrong location (or, has been created with wrong mknod parameters).Yes, /dev should be quite dynamic these days, but you never know... back in kernel 2.4 days I hosed one of my Gentoo systems and got it booting after I recreated some device nodes by hand.
Edit after grub.conf was added to original question:
The very first line (boot=/dev/sda) of your grub.conf looks suspicious. Try to comment it out and reboot.
If this did not work, you might also want to check out
/dev/sda1
parameters withls -lah
. It should look like this:Observe the values similar to 104, 1. Then create the /dev/root with
where you change 104 and 1 values according to yours.
Hope this helps!