How to Recover Data from Software RAID1 When MBR Is Lost on Both Drives

linuxmbrraid1software-raid

I'm trying to recover raid1 array, both disks are NVMe flash drives.

I did really stupid thing in the end of long and bad day – erased first 512 bytes of each NVMe drive – the intention was to disable boot loader.
It turned out that I erased partition data as well as RAID information.
I did backups of those 512 bytes – but guess what – i did them to the same disks, so they are unaccessible now.

I made copies of the disks with dd to another disk, and started to try to recover the data —
did testdisk, which found all partitions:

Disk /dev/nvme0n1 - 512 GB / 476 GiB - CHS 488386 64 32
Current partition structure:
    Partition                  Start        End    Size in sectors

1 * Linux RAID               1   0  1 32737  63 32   67045376 [rescue:0]
2 P Linux RAID           32769   0  1 33280  63 32    1048576 [rescue:1]
3 P Linux RAID           33281   0  1 488257  63 32  931792896 [rescue:2]

I wrote this partition data to both disk, made a reboot, but only /boot partition – first one – recovered.
I tried to assemble root partition (third one) with mdadm, but it failed with

[Sun May 27 11:30:40 2018] md: nvme0n1p3 does not have a valid v1.2 superblock, not importing!
[Sun May 27 11:30:45 2018] md: nvme0n1p3 does not have a valid v1.2 superblock, not importing!
[Sun May 27 13:45:32 2018] md: nvme1n1p1 does not have a valid v1.2 superblock, not importing!
[Sun May 27 13:45:32 2018] md: nvme0n1p1 does not have a valid v1.2 superblock, not importing!
[Sun May 27 13:45:32 2018] md: nvme1n1p3 does not have a valid v1.2 superblock, not importing!
[Sun May 27 13:45:32 2018] md: nvme0n1p3 does not have a valid v1.2 superblock, not importing!

My plan was to somehow mount root partition from one of the disks, get the sector backup, and restore everything.

But I can't mount /dev/nvme1n1p3, it fails

# mount /dev/nvme0n1p3  /mnt/arr2
mount: unknown filesystem type 'linux_raid_member'

# mount /dev/nvme0n1p3  /mnt/arr2 -t ext4
mount: /dev/nvme0n1p3 is already mounted or /mnt/arr2 busy

What can be done to get access to files in /dev/nvme0n1p3?

UPDATE: Thanks to advice from Peter Zhabin, I did tried to recover filesystem on one of the drives, /dev/nvme1n1, with partitions recovered with help of testdisk:

I took offset from another server with similar (but not exact) disks and partitioning:

 losetup --find --show --read-only --offset $((262144*512)) /dev/nvme1n1p3

Fsck complained for the wrong partitioning (or superblock), and gave FS statistics which looks really close to what was on the drive:

 fsck.ext3 -n -v /dev/loop1

    e2fsck 1.43.3 (04-Sep-2016)
    Warning: skipping journal recovery because doing a read-only filesystem check.
    The filesystem size (according to the superblock) is 116473936 blocks
    The physical size of the device is 116441344 blocks
    Either the superblock or the partition table is likely to be corrupt!
    Abort? no

    /dev/loop1 contains a file system with errors, check forced.
    Pass 1: Checking inodes, blocks, and sizes
    Inode 26881053 extent tree (at level 2) could be narrower.  Fix? no

    Pass 2: Checking directory structure
    Pass 3: Checking directory connectivity
    Pass 4: Checking reference counts
    Pass 5: Checking group summary information
    Free blocks count wrong (20689291, counted=20689278).
    Fix? no

    Free inodes count wrong (25426857, counted=25426852).
    Fix? no


         3695703 inodes used (12.69%, out of 29122560)
           30256 non-contiguous files (0.8%)
             442 non-contiguous directories (0.0%)
                 # of inodes with ind/dind/tind blocks: 0/0/0
                 Extent depth histogram: 3616322/1294/3
        95784645 blocks used (82.24%, out of 116473936)
               0 bad blocks
              29 large files

         3510238 regular files
          107220 directories
               2 character device files
               0 block device files
              53 fifos
            1248 links
           78147 symbolic links (77987 fast symbolic links)
              39 sockets
    ------------
         3696947 files

However, I was unable to mount the filesystem:

 root@rescue /mnt/backups # mount -o ro /dev/loop1 /mnt/reco/
 mount: wrong fs type, bad option, bad superblock on /dev/loop1,
   missing codepage or helper program, or other error

What can be done next? It feels like the data is so close…

Best Answer

Okay finally I managed to restore the MBR. As I mentioned above, I had backed up the MBR's of both of the RAID drives - to the drives themselves. It was done with help of dd command:

dd if=/dev/nvme0n1 bs=512 count=1 of=nvme0n1.bootsector.backup
dd if=/dev/nvme1n1 bs=512 count=1 of=nvme1n1.bootsector.backup

I thought that it would be possible to look for MBR backup files in the drive images. I've saved MBR sectors on the similar server to the file mbrb.backup, and it had the string:

 "GRUB\20\0Geom\0Hard\20Disk\0Read\0\20Error"

Since i did't managed how to look for the string with null bytes in 512Gb image, i did a grep search which looked for individual strings, like this on the working MBR:

#dd if=/dev/sdb of=mbrb.backup bs=512 count=1
#strings -t d mbr.backup | grep -4 -iE 'GRUB' | grep -4 'Geom' | grep -4 'Hard Disk' | grep -4 'Read' | grep -4 'Error'
392 GRUB
398 Geom
403 Hard Disk
413 Read
418  Error

I started to look for this string in raw drive:

#strings -t d /dev/nvme1n1 | grep -4 -iE 'GRUB' | grep -4 'Geom' | grep -4 'Hard Disk' | grep -4 'Read' | grep -4 'Error'

And it found some 20+ offsets with this string. The offsets looked like this:

34368320904 GRUB
34368320910 Geom
34368320915 Hard Disk
34368320925 Read
34368320930  Error

34702932360 GRUB
34702932366 Geom
34702932371 Hard Disk
34702932381 Read
34702932386  Error

and some more results....

Than I saved all of them with dd, computed the block count with bc:

bc 1.06.95
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
34368320904/512
67125626

dd if=/dev/nvme1n1 of=recovery_file.34368320904 bs=512 skip=67125626 count=2

Got some 20+ files, most of them was exactly similar, perhaps some GRUB files. Then I started to compare them to the MBR I saved from the working server. The last one looked very similar. I've saved it into the MBR of broken disk:

 dd if=recovery_file.475173835144 of=/dev/nvme1n1 bs=521 count=1

Checked it with testdisk, interestingly, it complained that the partitions were wrong, but everything else looked very promising:

Disk /dev/nvme1n1 - 512 GB / 476 GiB - CHS 488386 64 32
Current partition structure:
 Partition                  Start        End    Size in sectors

 1 P Linux RAID               1   0  1 32768  63 32   67108864 [rescue:0]

Warning: Bad starting sector (CHS and LBA don't match)
 2 P Linux RAID           32769   0  1 33280  63 32    1048576 [rescue:1]

Warning: Bad starting sector (CHS and LBA don't match)
 3 P Linux RAID           33281   0  1 488385  21 16  932053680 [rescue:2]

Warning: Bad starting sector (CHS and LBA don't match)
No partition is bootable

So I took the risk and put the same MBR to the /dev/nvme0n1 raid. After reboot the md devices went up and my data was back. Looks like a miracle.

Best Answer

Related Solutions

Linux – LVM: resize2fs not resizing

Lvm – e2fsck / resize2fs problems

Related Topic