I just did the same thing a few weeks ago.
I like the idea of the WD Green 2TB drive, but NewEgg got a batch that had a 50% failure rate. Check the user comments. I went with Seagate 2TB drives.
I had no problem making the partitions like normal, making the raid 5, building the LVM on them, and formatting it XFS.
You may want to look into XFS over ext4 for larger drives. According to the docs, XFS handles larger TB drives better then ext4.
This is on a Slackware 13.1 box. The OS is not on the huge raid 5, but on a raid 1 with smaller drives.
It all worked perfect for me.
I would not mix md
(software RAID) and md
(LVM) RAID features. In the spirit of KISS, I'd go with pure md
with LVM on top for snapshots / resizes.
With 4 disks going RAID6 is a Bad Idea (TM). It gives you exactly as much space as RAID 10, but with much, much worse performance (you have to calculate two parities and face read-modify-write penalty for writes smaller than stripe size).
RAID 6 gives you marginally beter resilience (any 2 disks can fail, while in RAID 10 one disk from mirror-pair can fail) at a high cost. Not worth it.
RAID 10 gives you best performance possible in this setup.
- 2x 1TB RAID1 sets with LVM concatenation -- concurrent I/O performance depends on whether you hit the same, or different disk pairs. Solved by RAID 10, which spreads your I/O over all disks.
2x 1TB RAID1 sets with LVM striping -- should give similar performance to md
RAID 10, but you have more complicated setup. I love simplicity.
1x 2TB RAID6 (striped 2x1TB with 2x 1TB parity) with LVM on top -- worse performance for writes, bad performance with a lost drive.
Performance characteristics
I'm going to assume RAID 10, because in my opinion it has all advantages and no downsides in your scenario. You are going to be limited by a performance equivalent of a pair of HDDs. In other words you are going to be able to serve/write data at about 2x what a single HDD can do. For streaming you should be able to saturate 1Gbps link with no effort (reading or writing). For bursty data you are stuck with ~150 IOps (assuming 7.2krpm SATA drives). RAID 10 will guarantee that the load is spread among all drives for all I/O (unless you are unlucky enough to have application access data with a stride matching your RAID chunk size), and RAID 10 "far" layout should give you similar performance no matter which region of filesystem you are accessing.
Lost drive means marginal loss in read access time (you loose the "far" layout benefit for the affected mirror pair).
If you expand the storage with another pair of mirrored drives md
will not be able to reshape the data to re-stripe it over the new space. Effectively you have a RAID10 + RAID1 setup, unless you backup, re-create the array and restore.
Best Answer
Be careful! gpt labels, required for disks > 2 TiB, are 39 (512-byte) sectors long. So if you create your first partition immediately after the label, it won't be on a 4KiB boundary.
GNU parted does not do this by default, probably because many "Advanced Format" drives falsely claim that their physical sectors, not just their logical sectors, are still only 512B.
So if you're using GNU parted, ensure that each partition starts on an LBA divisible by 8 (LBAs remain 512B, so 8*512B = 4KiB). LBAs originate at 0, so start the first partition at "40s".
Also, if you use GRUB, leave room for its second stage bootstrap. MS-DOS labels are 63 sectors, with enough unused room for GRUB to stash its second stage bootstrap, but there's no unused space in a gpt label. So make a small partition 1, set its "bios_grub" flag, and then create your "real" partitions after that -- making sure that each and every one begins on a LBA that's a multiple of 8.