Disk or Network Bottleneck and Perfmon

bottlenecknetwork-attached-storageperfmon

I'm trying to see if we're choking our NAS server with RAID 5 on a PERC RAID card.

I've been playing with Perfmon and have been looking at various ways to try narrowing down the performance counting but am hitting things like, for example, finding out if you're hitting page faults. So you use page faults/sec. But this counts soft and hard page faults, so you also need to look at page faults in memory as well and see if that's high along with the read queue length and if they're both high then maybe it's a memory issue, and for things like the queue length for the disks I read that if it stays above 2 you may have a problem with the disk subsystem, but then I found 2 per disk so if it's 6 for a 3-disk RAID array that's bad but 2 is okay. In other words, I'm hitting conflicting information on what to monitor to narrow it down.

So I'm asking other admins what set of counters and values they use to find out if they have a disk controller/drive/network bottleneck when users are hitting the network full throttle. Ideally a chart of some kind targeted to this problem, not a general health of the server overview. The processor seems fine and there seems to be plenty of free memory (2 gig Dell NAS unit). I suspect either the network is choked or the disk system but don't know what combination of counters to use to test my theory (some of my researching of the counters seem to point out that a counter of X will look bad, but for your setup it's not, so you have to add in results of counters Y and Z to get the overall picture).

Dell Powervault NAS, 2 gig RAM, 3.4 Ghz processor, Powervault OS 3.4.9.1 (Windows 2003)

Best Answer

PAL cuts through a lot of the amiguities and does a nice job of showing troublespots.