High Disk IO spikes in HP Windows 2008 Server with SAN and Oracle 11g

oracle-11gstorage-area-network

I have a HP Proliant BL68C G5 server, running Windows Server 2008 R2 Standard edition server that is being used as an Oracle 11g data-server.

The machine itself has 20gb of RAM, dual Xeon 2.4ghz CPU's, a 146gb SAS drive (Raid 1+0) on Smart Array P400i as the operating drive and an HP Eva FC san array for the Oracle files.

I've checked for firmware updates for the FC HBA and the SAN controller, make sure windows is upto date and that i'm using the latest HP drivers.

However, there is slow performance from the Oracle database, an oracle consultant took a look at the Oracle installation and suggested that their is a problem with the disk subsystem.

Having run perform for 15 minutes during a typical busy session i've gotten the following figures.

% Disk Time: Avg: 61 Max: 15,145

Avg. Disk read Queue Length: Avg: 1.043 Max: 8.755

Avg. Disk write queue Length: Avg: 1.911 Max: 756.456

% Processor time: Avg: 2.529 Max: 23.655

Avg. Disk sec /Read: avg: 0.013 Max: 0.041

Avg. Disk sec /Write: Avg: 0.008 Max: 0.153

Memory Available Byes: avg: 1.0780e+010 Max: 1.0796e+010

From my understanding the average figures are good but the maximum figures are really high. I also understand that Disk Time isn't the best figure to use when working with SAN arrays, but the Maximum Queue length has me worried, it ties in with what Oracle said that disk access is slow.

I have looked at the network access and there appears to be a maximum of 75mbits of traffic througput over the same period, which doesn't seem a lot considering the network uses Gigabit ethernet.

Has anyone come across a similar situation before or have any pointers on how I can investigate it further.

The peformance of the machine seems very good to me, but being locked in a battle with Oracle to prove it is their software that is causing disk issues rather than the SAN itself is quite frustrating.

I've tried to be comprehensive with my description but if anyone has any suggestions and requires more information please don't hesitate to ask.

Best Answer

Avg. Disk sec /Read: avg: 0.013 Max: 0.041

Avg. Disk sec /Write: Avg: 0.008 Max: 0.153

The ONLY relevant counters I see. Really. Queue lentsh are sort of very hard to judge.

For a high end san, both average and high numbres are WAY to high. Looks like either an IO bottleneck or a config issue somewhere.

The peformance of the machine seems very good to me, but being locked in a battle with Oracle to prove it is their software that is causing disk issues rather than the SAN itself is quite frustrating.

Mostly because it is the SAN. It is SLOW. The numbers would be way too high for a mid range DAS system like I have (Velociraptors, no SAS discs), for a real SAN they are really really really high.

but the Maximum Queue length has me worried, it ties in with what Oracle said that disk access is slow.

Now, this is the tricky thing. Queue length interpretation depends on SO many factors it is not eve nfunny to say. 756k disc queue length means oracle dumps a LOT of stuff on the SAN and the SAN does not answer. Indicates a bottleneck, clearly. But what do the numbers mean?

On the other hand, Sec/Write went from 0.008 t .153 seconds. 0.153 is REALLY slow. 0.008 is not reall fast to start with (assuming a real SAN).

Definitely not an Oracle issue - your disc subsystem is bottlenecking.

Related Solutions

How to uninstall Oracle 11g from Windows

I've run into this problem before on a machine where windows was managing the swap space. Oracle, thinking it's smarter than windows, seems to require you to have your swap space discretely allocated. You can do that by following the steps below.

How to:

Right click on my computer > properties > advanced In the Performance section click on 'Settings' and find the Advanced Tab In the Advanced tab, find the Virtual Memory section and click on change Select a drive with disk space available and add provide a min and max size

I usually go with a setting that's 2x my RAM for both the Min and Max - windows chews up a lot of cycles trying to grow the swap file.

Oracle 11g server rename issue

You can find the config files for the database specific DBCONSOLE in your ORACLE_HOME under a direcory named with the FQDN of the Host, an underscore, and the SID. eg

myhost.mynetwork_mydatabase

You could try stopping the dbconsole with

set ORACLE_SID=mydatabase
emctl stop dbconsole

Then fiddling with the config files and dir names and then restarting the console.

If that doesn't work then you can reinstall the dbconsole into a database with command line tools. I'd lookup the full set of help on EMCTL.

Update: I had to recreate some DBCONSOLE repositories recently.

The following steps worked well.

Manual DBCONSOLE removal and recreation

1. Remove the existing Windows Service
    Remove HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\OracleDBConsole[SID]

2. Remove the existing setup from the filesystem
    Delete %ORACLE_HOME%\DomainName_SID
    Delete %ORACLE_HOME%\oc4j\j2ee\OC4J_DBConsole_DomainName_SID

3. Remove the SYSMAN schema from the database as SYS or SYSTEM
    drop user sysman cascade;
    drop role MGMT_USER;
    drop user MGMT_VIEW cascade;
    drop public synonym MGMT_TARGET_BLACKOUTS;
    drop public synonym SETEMVIEWUSERCONTEXT;

4. Run the Database Configuration Assistant
    Select the database from the list
    Ensure that the DBCONSOLE option and the Enterprise Manager Repository options are ticked

Best Answer

Related Solutions

How to uninstall Oracle 11g from Windows

Oracle 11g server rename issue

Related Topic