Linux RAID – Mdadm and 4K Sectors (Advanced Format)

advanced-formatlinuxmdadmraidstorage

There are numerous questions on Serverfault about aligning 4k sectors disks, but one thing is not really clear to me yet.

I successfully aligned my RAID1+LVM. One of the things I did was use mdadm superblock version 1.0 (which stores the superblock at the end of the disk).

The manpage says this:

The different sub-versions store the
superblock at different locations on
the device, either at the end (for
1.0), at the start (for 1.1) or 4K from the start (for 1.2). "1" is
equivalent to "1.0". "default" is
equivalent to "1.2".

Is the 1.2 version, which is default, made for 4k sectors drives? The way I see it, it is not, because 4k from the start + the length of the superblock is not a multitude of 4k (the superblock is about 200 bytes long, if I remember correctly).

Any insight into this is welcome.

edit:

below was answered that mdadm superblock 1.1 and 1.2 are meant for 4k alignment. I just created a whole-device raid with:

mdadm --create /dev/md4 -l 1 -n 2 /dev/sdb /dev/sdd

Then I added a logical volume to it:

vgcreate universe2 /dev/md4

The array is syncing at 16 MB/s:

md4 : active raid1 sdd[1] sdb[0]
      1465137424 blocks super 1.2 [2/2] [UU]
      [>....................]  resync =  0.8% (13100352/1465137424) finish=1471.6min speed=16443K/sec

So I doubt it is properly aligned.

(disks are 1.5 TB WD EARS. I have them in my desktop PC and they synced at about 80 MB/s.)

Edit2:

Here's –examine output:

# mdadm --examine /dev/sdb
/dev/sdb:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 79843828:7d939cce:1c8f0b32:cf339870
           Name : brick:4  (local to host brick)
  Creation Time : Sat Jul  9 10:47:33 2011
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 2930275120 (1397.26 GiB 1500.30 GB)
     Array Size : 2930274848 (1397.26 GiB 1500.30 GB)
  Used Dev Size : 2930274848 (1397.26 GiB 1500.30 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : dd2e3b5f:33214b96:1cb88169:25deb050

    Update Time : Sat Jul  9 10:49:06 2011
       Checksum : 4f7cd785 - correct
         Events : 1


   Device Role : Active device 0
   Array State : AA ('A' == active, '.' == missing)

Data offset is 2048 sectors, which is dividable by 8, so one would think it's ok. The volume group has a physical extent size of 4 MiB, which is also dividable by 8. But that wouldn't even matter, because the resync is not related to what the device contains.

Another edit: it doesn't appear to be an alignment issue; since hdparm -t shows a very low read speed for one of the disks (30 MB/s). Something else is amiss.

Edit2: I never remember to update this post when I found the answer. All is nicely aligned. One of the disks was broken. Apparently it was on its last leg and even that broke at some point. A replacement disk worked fine.

Best Answer

Yes, it is made for 4k sector alignment.

With 1.1 and 1.2 superblocks, space is reserved at the start of each disk so that the superblock doesn't get trampled. The superblock creation code forces this reserved space to be a multiple of 4kB. All physical reads are offset from the end of this reserved space, not from the end of the superblock. This therefore preserves the alignment for any sector size that divides evenly into 4kB.

If you're interested, here is the proof from the mdadm source code (super1.c):

/* force 4K alignment */
reserved &= ~7ULL;
sb->data_offset = __cpu_to_le64(reserved);

And this data_offset parameter is used by the RAID1 code in the kernel to offset the physical reads, e.g. in the read path:

read_bio->bi_sector = r1_bio->sector + mirror->rdev->data_offset