High Disk IO spikes in HP Windows 2008 Server with SAN and Oracle 11g

oracle-11gstorage-area-network

I have a HP Proliant BL68C G5 server, running Windows Server 2008 R2 Standard edition server that is being used as an Oracle 11g data-server.

The machine itself has 20gb of RAM, dual Xeon 2.4ghz CPU's, a 146gb SAS drive (Raid 1+0) on Smart Array P400i as the operating drive and an HP Eva FC san array for the Oracle files.

I've checked for firmware updates for the FC HBA and the SAN controller, make sure windows is upto date and that i'm using the latest HP drivers.

However, there is slow performance from the Oracle database, an oracle consultant took a look at the Oracle installation and suggested that their is a problem with the disk subsystem.

Having run perform for 15 minutes during a typical busy session i've gotten the following figures.

% Disk Time: Avg: 61 Max: 15,145

Avg. Disk read Queue Length: Avg: 1.043 Max: 8.755

Avg. Disk write queue Length: Avg: 1.911 Max: 756.456

% Processor time: Avg: 2.529 Max: 23.655

Avg. Disk sec /Read: avg: 0.013 Max: 0.041

Avg. Disk sec /Write: Avg: 0.008 Max: 0.153

Memory Available Byes: avg: 1.0780e+010 Max: 1.0796e+010

From my understanding the average figures are good but the maximum figures are really high. I also understand that Disk Time isn't the best figure to use when working with SAN arrays, but the Maximum Queue length has me worried, it ties in with what Oracle said that disk access is slow.

I have looked at the network access and there appears to be a maximum of 75mbits of traffic througput over the same period, which doesn't seem a lot considering the network uses Gigabit ethernet.

Has anyone come across a similar situation before or have any pointers on how I can investigate it further.

The peformance of the machine seems very good to me, but being locked in a battle with Oracle to prove it is their software that is causing disk issues rather than the SAN itself is quite frustrating.

I've tried to be comprehensive with my description but if anyone has any suggestions and requires more information please don't hesitate to ask.

Best Answer

Avg. Disk sec /Read: avg: 0.013 Max: 0.041

Avg. Disk sec /Write: Avg: 0.008 Max: 0.153

The ONLY relevant counters I see. Really. Queue lentsh are sort of very hard to judge.

For a high end san, both average and high numbres are WAY to high. Looks like either an IO bottleneck or a config issue somewhere.

The peformance of the machine seems very good to me, but being locked in a battle with Oracle to prove it is their software that is causing disk issues rather than the SAN itself is quite frustrating.

Mostly because it is the SAN. It is SLOW. The numbers would be way too high for a mid range DAS system like I have (Velociraptors, no SAS discs), for a real SAN they are really really really high.

but the Maximum Queue length has me worried, it ties in with what Oracle said that disk access is slow.

Now, this is the tricky thing. Queue length interpretation depends on SO many factors it is not eve nfunny to say. 756k disc queue length means oracle dumps a LOT of stuff on the SAN and the SAN does not answer. Indicates a bottleneck, clearly. But what do the numbers mean?

On the other hand, Sec/Write went from 0.008 t .153 seconds. 0.153 is REALLY slow. 0.008 is not reall fast to start with (assuming a real SAN).

Definitely not an Oracle issue - your disc subsystem is bottlenecking.

Related Topic