I'm trying to recover raid1 array, both disks are NVMe flash drives.
I did really stupid thing in the end of long and bad day – erased first 512 bytes of each NVMe drive – the intention was to disable boot loader.
It turned out that I erased partition data as well as RAID information.
I did backups of those 512 bytes – but guess what – i did them to the same disks, so they are unaccessible now.
I made copies of the disks with dd to another disk, and started to try to recover the data —
did testdisk, which found all partitions:
Disk /dev/nvme0n1 - 512 GB / 476 GiB - CHS 488386 64 32
Current partition structure:
Partition Start End Size in sectors
1 * Linux RAID 1 0 1 32737 63 32 67045376 [rescue:0]
2 P Linux RAID 32769 0 1 33280 63 32 1048576 [rescue:1]
3 P Linux RAID 33281 0 1 488257 63 32 931792896 [rescue:2]
I wrote this partition data to both disk, made a reboot, but only /boot partition – first one – recovered.
I tried to assemble root partition (third one) with mdadm, but it failed with
[Sun May 27 11:30:40 2018] md: nvme0n1p3 does not have a valid v1.2 superblock, not importing!
[Sun May 27 11:30:45 2018] md: nvme0n1p3 does not have a valid v1.2 superblock, not importing!
[Sun May 27 13:45:32 2018] md: nvme1n1p1 does not have a valid v1.2 superblock, not importing!
[Sun May 27 13:45:32 2018] md: nvme0n1p1 does not have a valid v1.2 superblock, not importing!
[Sun May 27 13:45:32 2018] md: nvme1n1p3 does not have a valid v1.2 superblock, not importing!
[Sun May 27 13:45:32 2018] md: nvme0n1p3 does not have a valid v1.2 superblock, not importing!
My plan was to somehow mount root partition from one of the disks, get the sector backup, and restore everything.
But I can't mount /dev/nvme1n1p3, it fails
# mount /dev/nvme0n1p3 /mnt/arr2
mount: unknown filesystem type 'linux_raid_member'
# mount /dev/nvme0n1p3 /mnt/arr2 -t ext4
mount: /dev/nvme0n1p3 is already mounted or /mnt/arr2 busy
What can be done to get access to files in /dev/nvme0n1p3?
UPDATE: Thanks to advice from Peter Zhabin, I did tried to recover filesystem on one of the drives, /dev/nvme1n1, with partitions recovered with help of testdisk:
I took offset from another server with similar (but not exact) disks and partitioning:
losetup --find --show --read-only --offset $((262144*512)) /dev/nvme1n1p3
Fsck complained for the wrong partitioning (or superblock), and gave FS statistics which looks really close to what was on the drive:
fsck.ext3 -n -v /dev/loop1
e2fsck 1.43.3 (04-Sep-2016)
Warning: skipping journal recovery because doing a read-only filesystem check.
The filesystem size (according to the superblock) is 116473936 blocks
The physical size of the device is 116441344 blocks
Either the superblock or the partition table is likely to be corrupt!
Abort? no
/dev/loop1 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Inode 26881053 extent tree (at level 2) could be narrower. Fix? no
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (20689291, counted=20689278).
Fix? no
Free inodes count wrong (25426857, counted=25426852).
Fix? no
3695703 inodes used (12.69%, out of 29122560)
30256 non-contiguous files (0.8%)
442 non-contiguous directories (0.0%)
# of inodes with ind/dind/tind blocks: 0/0/0
Extent depth histogram: 3616322/1294/3
95784645 blocks used (82.24%, out of 116473936)
0 bad blocks
29 large files
3510238 regular files
107220 directories
2 character device files
0 block device files
53 fifos
1248 links
78147 symbolic links (77987 fast symbolic links)
39 sockets
------------
3696947 files
However, I was unable to mount the filesystem:
root@rescue /mnt/backups # mount -o ro /dev/loop1 /mnt/reco/
mount: wrong fs type, bad option, bad superblock on /dev/loop1,
missing codepage or helper program, or other error
What can be done next? It feels like the data is so close…
Best Answer
Okay finally I managed to restore the MBR. As I mentioned above, I had backed up the MBR's of both of the RAID drives - to the drives themselves. It was done with help of dd command:
I thought that it would be possible to look for MBR backup files in the drive images. I've saved MBR sectors on the similar server to the file mbrb.backup, and it had the string:
Since i did't managed how to look for the string with null bytes in 512Gb image, i did a grep search which looked for individual strings, like this on the working MBR:
I started to look for this string in raw drive:
And it found some 20+ offsets with this string. The offsets looked like this:
Than I saved all of them with dd, computed the block count with bc:
Got some 20+ files, most of them was exactly similar, perhaps some GRUB files. Then I started to compare them to the MBR I saved from the working server. The last one looked very similar. I've saved it into the MBR of broken disk:
Checked it with testdisk, interestingly, it complained that the partitions were wrong, but everything else looked very promising:
So I took the risk and put the same MBR to the /dev/nvme0n1 raid. After reboot the md devices went up and my data was back. Looks like a miracle.