Sql-server – High disk time on SQL Server

ioperformancesql server

We have a dedicated SQL Server 2008 R2 Enterprise Edition.

The setup is:

  • D: (data files) – stored on local ssd disks (not the same disks as log files) (raid 10)
  • E: (log files) – stored on local ssd disks (not the same disks as data files) (raid 1)
  • F: (transaction log backup) – stored remote on a SAN

Today we moved our log files to new disks (from F: to E:). From a shared volume (F:(SAN)) to dedicated local disks (E:).

What then happend was that the "disk time", "avg. transfer time" and "avg disk write queue length" increased on the volume where we have the data files (D:) (not on the volume where the log files are located).

The data volume and log volume does not share disks, however they share the same controller card.

"Disk idle time" is low for all volumes.

One thought is ofcourse that the controller card might be overloaded. But, we need more ideas on where the problem might be.

UPDATE:

The RAID controller is DELL PERC H700 (512 MB cache). The server is a DELL R910.

We have about 2500 transactions / sec during peak hours.

The "disk time"-counter has been at 100% since we moved the log files (even during low traffic time).

The "disk idle time" however is about 98-99% for D: (data files) and E: (log files)

We have write back cache enabled for booth the data disks and the log disks.

The wait stats looks like this:

wait_type                     wait_time_s
---------                     ----------- 
BROKER_TASK_STOP               1283336.21 
FT_IFTS_SCHEDULER_IDLE_WAIT     101357.47 
PAGELATCH_EX                     89712.72 
BROKER_TRANSMITTER               75894.76
XE_TIMER_EVENT                   38778.35
REQUEST_FOR_DEADLOCK_SEARCH      38770.35
SQLTRACE_INCREMENTAL_FLUSH_SLEEP 38767.03 
FT_IFTSHC_MUTEX                  38759.14
LOGMGR_QUEUE                     38632.87
CHECKPOINT_QUEUE                 38382.63
BROKER_EVENTHANDLER              35082.42   
XE_DISPATCHER_WAIT               34396.31  
DISPATCHER_QUEUE_SEMAPHORE       33578.68

Best Answer

After some more research (checking waits and running the site with peak traffic), the "problem" described above was actually not a problem.

The problem appeared when we removed a bottleneck (the old log storage). So when we got faster disks for the tranaction logs the data disks could handle more transactions / sec and therefore the queue length increased.

It also explain why disk idle time was good.

The "disk time"-counter seems to be pretty much useless for a fast disk system (using cache etc).