Linux – Remake SW RAID1 from a new HDD and an old HDD with bad blocks

debianlinuxmdadmraid1software-raid

I have a SW RAID1 and I just replaced /dev/sda with a new HDD as the old one failed.
Now, upon trying to recreate the RAID array I discovered that the "good" HDD (/dev/sdb) has bad blocks with prevent mdadm from resyncing the array.

While I could make backups, replace /dev/sdb as well and re-install the server completely I was wondering if there is any way I could "trick" mdadm into resyncing the RAID array and then replace /dev/sdb with a new HDD.
From what I can guess the badblocks are located in an unused area of /dev/sdb which is only used when trying to recreate the RAID array.

Best Answer

Can you verify whether the affected blocks and underlying bad sectors on the disk are reallocated to "spare sectors" area? The bad sector should be reallocated when write operation fails. Verify it with smartctl:

 smartctl -a /dev/sdb | grep -i reallocated

The last column should contain a number of total reallocated sectors. If there is zero try to read the bad sector:

hdparm –-read-sector XXXXXXXX /dev/sdb

It should return an I/O error otherwise I would recommend to skip next section.

The error means the sector was not reallocated yet. So you can try to reallocate it forcibly by writing it. Remember that any data stored in this sector will be lost after this step !!!:

hdparm –-write-sector XXXXXXXX --yes-i-know-what-i-am-doing /dev/sdb

By the way, the sector number XXXXXXXX should be possible to obtain from kernel messages (dmesg command or from /var/log/messages). As you had bad blocks during resynchronisation there should be some related messages similar to:

... end_request: I/O error, dev sdb, sector 1261071601

Then, try to verify it with smartctl again. Does the counter increased? If so try to read it with hdparm. Now, it should read it without any error as it is supposed to be reallocated. Done.

Finally, you can continue with mdadm and with adding the disk to your degraded mirror.

Related Solutions

Lvm – Linux Software RAID1: How to boot after (physically) removing /dev/sda? (LVM, mdadm, Grub2)

You need to install GRUB to the MBR of both drives, and you need to do it in a way that GRUB considers each disk to be the first disk in the system.

GRUB uses its own enumeration for disks, which is abstracted from what the Linux kernel presents. You can change which device it thinks is the first disk (hd0), by using a "device" line in the grub shell, like so:

device (hd0) /dev/sdb

This tells grub that, for all subsequent commands, treat /dev/sdb as the disk hd0. From here you can complete the installation manually:

device (hd0) /dev/sdb
root (hd0,0)
setup (hd0)

This sets up GRUB on the first partition of the disk it considers to be hd0, which you've just set as /dev/sdb.

I do the same for both /dev/sda and /dev/sdb, just to be sure.

Edited to add: I always found the Gentoo Wiki handy, until I did this often enough to commit it to memory.

Build and migrated to software raid (mdadm) on GPT disk, now can’t assemble array

mdadm doesn't recognize partitions, the Linux kernel does. A software RAID array doesn't need to know or care what type of partitions the disk uses, because it just uses the block devices that the kernel provides for the partitions. I'm using mdadm arrays on GPT disks on several computers and they work fine.

The partition layout you described doesn't make sense:

/dev/sda
 /dev/sda1 <- GPT type partition
  /dev/sda1 <- exists within the GPT part, member of md127
  /dev/sda2 <- exists within the GPT part, empty

/dev/sdb
 /dev/sdb1 <- GPT type partition
  /dev/sdb1 <- exists within the GPT part, member of md127

In particular, it looks like you're saying that sda2 is located within sda1. Partitions don't exist within other partitions, and GPT is a characteristic of the whole-disk device, not a partition. I think what you actually mean is:

/dev/sda <- GPT disk
 /dev/sda1 <- member of md127
 /dev/sda2 <- empty

/dev/sdb <- GPT disk
 /dev/sdb1 <- member of md127

However, your blkid output says that /dev/sda1 currently contains an Ext4 filesystem, not a RAID superblock — it's not a member of md127. It's not clear how that filesystem got there, since you said that you were using it as a RAID component, but since your story is long and full of twists, I suspect there may have been points where things happened that you didn't realize had happened. My suggestion at this point is:

Assemble the array in degraded mode using just /dev/sdb1. Check that it contains your data; if not, check whether /dev/sda1 somehow contains an intact filesystem with your data, otherwise I hope you have a backup.
Make a backup of all your data, if you don't have one already.
Completely wipe /dev/sda: dd if=/dev/zero of=/dev/sda bs=1M. Then use gdisk to recreate the partition(s).
Create a new degraded array using only a partition on sda. Make a filesystem in it, and copy your data into it.
Disassemble the array that's using sdb1, and completely wipe /dev/sdb: dd if=/dev/zero of=/dev/sdb bs=1M. Then use gdisk to recreate the partition.
Add /dev/sdb1 to the new array and let it sync.

As for installing GRUB, it depends on whether your machine supports EFI (and whether you're using it for booting). If you're using EFI, you need to make an EFI system partition somewhere; it should be roughly 100MB, formatted FAT32. Then you'd install the EFI version of GRUB. I won't go into too much detail on this; EFI booting is a topic for a separate question.

If you're not using EFI to boot, you need to make a "BIOS Boot" partition somewhere on the disk that you'll be installing GRUB on. (This is partition type code ef02 in gdisk.) The partition can be tiny; 1MB is plenty. GRUB will use this to store the boot code that it would have written to sectors 1 through 62 on an MBR disk. (On an MBR disk, those sectors are typically unallocated since the first partition typically begins at sector 63, but on a GPT disk, the partition table is located in that area.) GRUB should automatically notice that the disk you're installing it to contains a BIOS Boot partition, and put its boot code there instead of in sectors 1-62.

Best Answer

Related Solutions

Lvm – Linux Software RAID1: How to boot after (physically) removing /dev/sda? (LVM, mdadm, Grub2)

Build and migrated to software raid (mdadm) on GPT disk, now can’t assemble array

Related Topic