Filesystem performance degraded during RAID rebuilding

filesystemsperformanceraid

So quick question – our RAID6 array is currently rebuilding and there is a VERY noticeable filesystem performance hit (home directories are NFS mounted on the array).

I'd sort of expect that, given you're rebuilding the array so there's massive read/write burden on the controller, but it occurred to me I don't really have anything to compare this to.

Is seeing serious (5-10 second freezes pretty frequently) an expected kind of behavior during RAID rebuilding coupled with heavy read/write usage (performance takes a noticeable hit during backups and when users are downloading large [multi GB] files via FTP).

Any thoughts on this would be appreciated. This is hardware RAID6 (LSI 9266-i8) on a 40TB array mounted over NFS locally (i.e. the server is physically very close to the workstations).

Best Answer

First, here is a great resource that outlines rebuild times.

RAID rebuilds and how they work pre and post failure.

Now, as far as my thoughts about the rebuild, we do know that rebuilds make for some really sluggish performance and rightfully so. As you will see from my link above, RAID rebuilds are not only extracting information from a failed disk to the good known disks (in the event of a post failure rebuild), they are also writing information to the system drive as well as other data/secondary drives all the while the server operates. Another thing to keep in mind is that usual functions that you would normally see take no time and relatively little resources at all now take more resources than normal and tax an already taxed server. In the event of a pre-rebuild failure (a little better on performance, but not much) You can get lucky and have a drive (logical or physical) fail and the RAID rebuild before end users (hopefully you as an SA should have some sort of alerting system so you shouldn't be surprised by it) even know anything had a problem.

The 5-10 second freezes you see are definitely normal and especially if the server you are rebuilding on is any kind of a database server that has higher than usual writes and reads by default (i.e. a SQL server that houses a database that end users access all day long; a property management company I used to consult for had a program that accessed their tenant records all day long for viewing and writing new information to them and it always had heavy usage.) it will be more noticeable.

Another thing I recommend is to get whatever RAID utility (the GUI version) comes with your controller and install it on the operating system so you can monitor the rebuild without having to load into a Controller BIOS.

A very small and almost non-existant issue these days is NFS vs iSCSI. I know you're using NFS and it used to be that iSCSI would have better overall performance in the case of virtualization, but with recent improvements to hypervisors and hard drives, as well as controllers, NFS is almost identical in performance to iSCSI so it sounds like you have a very nice SAN.

I'd be happy to answer anything else you need to know, so please feel free to comment.

Related Topic