Linux – Quantifying the effects of partition mis-alignment

iolinuxpartitionpartition-alignmentperformance

I'm experiencing some significant performance issues on an NFS server. I've been reading up a bit on partition alignment, and I think I have my partitions mis-aligned. I can't find anything that tells me how to actually quantify the effects of mis-aligned partitions. Some of the general information I found suggests the performance penalty can be quite high (upwards of 60%) and others say it's negligible. What I want to do is determine if partition alignment is a factor in this server's performance problems or not; and if so, to what degree?

So I'll put my info out here, and hopefully the community can confirm if my partitions are indeed mis-aligned, and if so, help me put a number to what the performance cost is.

Server is a Dell R510 with dual E5620 CPUs and 8 GB RAM. There are eight 15k 2.5” 600 GB drives (Seagate ST3600057SS) configured in hardware RAID-6 with a single hot spare. RAID controller is a Dell PERC H700 w/512MB cache (Linux sees this as a LSI MegaSAS 9260). OS is CentOS 5.6, home directory partition is ext3, with options “rw,data=journal,usrquota”.

I have the HW RAID configured to present two virtual disks to the OS: /dev/sda for the OS (boot, root and swap partitions), and /dev/sdb for a big NFS share:

[root@lnxutil1 ~]# parted -s /dev/sda unit s print

Model: DELL PERC H700 (scsi)
Disk /dev/sda: 134217599s
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start    End         Size        Type     File system  Flags
 1      63s      465884s     465822s     primary  ext2         boot
 2      465885s  134207009s  133741125s  primary               lvm

[root@lnxutil1 ~]# parted -s /dev/sdb unit s print

Model: DELL PERC H700 (scsi)
Disk /dev/sdb: 5720768639s
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start  End          Size         File system  Name  Flags
 1      34s    5720768606s  5720768573s                     lvm

Edit 1
Using the cfq IO scheduler (default for CentOS 5.6):

# cat /sys/block/sd{a,b}/queue/scheduler
noop anticipatory deadline [cfq]
noop anticipatory deadline [cfq]

Chunk size is the same as strip size, right? If so, then 64kB:

# /opt/MegaCli -LDInfo -Lall -aALL -NoLog
Adapter #0

Number of Virtual Disks: 2
Virtual Disk: 0 (target id: 0)
Name:os
RAID Level: Primary-6, Secondary-0, RAID Level Qualifier-3
Size:65535MB
State: Optimal
Stripe Size: 64kB
Number Of Drives:7
Span Depth:1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAdaptive, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default
Number of Spans: 1
Span: 0 - Number of PDs: 7

... physical disk info removed for brevity ...

Virtual Disk: 1 (target id: 1)
Name:share
RAID Level: Primary-6, Secondary-0, RAID Level Qualifier-3
Size:2793344MB
State: Optimal
Stripe Size: 64kB
Number Of Drives:7
Span Depth:1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAdaptive, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default
Number of Spans: 1
Span: 0 - Number of PDs: 7

If it's not obvious, virtual disk 0 corresponds to /dev/sda, for the OS; virtual disk 1 is /dev/sdb (the exported home directory tree).

Best Answer

Your partitions are misaligned and it might be difficult to asses how much you're actually losing in performance because that would depend on the type of I/O workload. It might be negligible if your I/O workload is light, compared to the performance of your disks. However, since this is a NFS server, I'm assuming it's not negligible and should be addressed. Some estimates put the performance penalty at 20-30%.

Partition misalignment basically means you might need two I/O operations at hardware level to satisfy one I/O operation at software level. If your software blocks are not ending at the same hardware boundary, that will happen. From what you've written, you already researched and understand this.

  • Disk = 512-byte sectors
  • RAID = 65536-byte stripes (OK)
  • Partition 0 = starting at sector 63 (32256-byte offset)
  • Partition 1 = starting at sector 465885 (238533120-byte offset)
  • EXT2/3/4 block size = ?
  • EXT2/3/4 stride size = ? (calculator)

Keep in mind that partition misalignment is different from having your storage subsystem using block sizes that are different from what you software is using. This could also put more stress on your disks but is not really related to alignment issues.

Use tunefs -l /dev/sdaX | grep size to check the block size on your filesystem.

According to Red Hat Enterprise Linux's recommendations:

Generally, it is a good idea to align the partitions to a multiple of one MB (1024x1024 bytes). To achieve alignment, start sectors of partitions should always be a multiple of 2048, and end sectors should always be a multiple of 2048, minus 1. Note that the first partition can not start at sector 0, use sector 2048 instead.

It looks like you might be looking at having to move the data and recreate your partitions, if that is indeed the root cause of your NFS performance issue. However, the whole thing is usually more complex and I'd try to find evidence that other things are OK before considering a costly reinstall.