CentOS Superblock corruption

corruptionfilesystemsstorage-area-network

I'm on CentOS 5.2 and I'm having a problem booting my two database servers. Our IT department performed a SAN upgrade over the weekend and now I can't boot – they say the upgrade went fine but obviously something has happened. The error I get a boot time is this;

fsck.ext3: No such file or directory while trying to open /dev/VolGroup01/db

I have an external consultant who is looking at it and saying its a Superblock problem which can't be fixed, but thought these were recoverable (according to this at least; http://www.cyberciti.biz/faq/recover-bad-superblock-from-corrupted-partition/)

Anyone have any suggestions or pointers? Also, for future reference, what should I keep backups of beyond my data?


Utterly desperate and willing to pay for recovery at this point.

Best Answer

I am willing to bet you know that the SAN has shifted the beginning of the physical disk off by a few bytes. I've seen this before. Its a bitch to get your files off of a disk that has done this but it is possible.

If you run 'fdisk -l' do you get messages about the starting cylinders on the device not marrying up? Its usually in brackets around each partition declaration.

Do you manage to find the LVM groups but not the disk itself? Is the LVM device made up of multiple SAN disks and just one is affected?

The following script is going to try to search for the correct offset on /dev/sdb where your lvm partition starts. No guarantees it will find anything. If it does, you might be in a good position to recover your data.

#!/usr/bin/python
import sys
def BoyerMooreHorspool(pattern, text):
    m = len(pattern)
    n = len(text)
    if m > n: return -1
    skip = []
    for k in range(256): skip.append(m)
    for k in range(m - 1): skip[ord(pattern[k])] = m - k - 1
    skip = tuple(skip)
    k = m - 1
    while k < n:
        j = m - 1; i = k
        while j >= 0 and text[i] == pattern[j]:
            j -= 1; i -= 1
        if j == -1: return i + 1
        k += skip[ord(text[k])]
    return -1

if __name__ == '__main__':
   giveup = 1024*1024*1024*2
   lba_offset = 0
   text = ""
   disk = open('/dev/sdb', 'r')
   while disk.tell() < giveup:
      #print "Checking: %f" % (lba_offset/(1024*1024))
      text = disk.read(1048576)
      s = BoyerMooreHorspool("\x00\x00\x00LVM2", text)
      if s > -1:
         print "Try offset: %d" % ((lba_offset+int(s))-533)
         sys.exit(0)
      else:
         lba_offset += 1048576
   print "Unable to find LVM position!"

Can you return what output you get?