Terminal Server disk performance

performanceraid5terminal-serverwindows-server-2003windows-terminal-services

Our users experience slow session performance at times during normal work hours. Applications (IE, Office Apps etc.) are slow to respond so is switching between them. This problem happens sporadically and below is some troubleshooting that took place.

We started gathering performance counters through the day and asked that users report when the slowdowns occur. See below for the graphs that show disk performance. The arrows point to the times when users reported slowdowns, and show that the problem is disk related.

Disk use graphs

Can anyone suggest further troubleshooting in order to track the culprit process/application?

Some server specs [OS: Server 2003 32bit Enterprise with /PAE flag] [RAM: 32GB] [CPU: 2xQuad Core @ 2.27Ghz] [HD: RAID5 1.2GB 3xSAS 10,000RPM HD. Controller has no battery and write cache is disabled]

Using Process Explorer i can take a look at processes and track which do the most disk reads/writes.

Processes with highest DISK WRITES: System, ccSvcHst.exe (Symantec Process), FireFox.exe

Processes with highest DISK READS: winlogon.exe, firefox.exe, explorer.exe

Processes with highest DISK WRITE BYTES: System, firefox.exe, ccSvcHst.exe

Processes with highest DISK READ BYTES: System, winlogon.exe, firefox.exe

Best Answer

Write caching disabled and RAID5? That is a particularly underperforming combination of bad. Windows stands on the user profiles, so the appdata and registry activity alone would surface this issue on such a poor-performing storage subsystem. There could be other aggravating factors, such as the default registry lazy flush interval is too frequent.

The registry lazy flush interval may be increased by adjusting the following DWORD registry value:

Key: HKLM\System\CurrentControlSet\Control\Session Manager\Configuration Manager  
Value: RegistryLazyFlushInterval 

Use 60 (decimal) to specify 60 seconds. I believe the default value is 5 seconds.

The registry in particular is pre-disposed to locking issues. One issue we encountered on Windows Server 2003 manifested after an Internet Explorer security hotfix, and was related to the Browser Helper Object for Java. You can read more about that here:

https://serverfault.com/a/110242/20701

20 users seems a bit low to experience performance issues, however it's difficult to know because that is really based on the applications in use and the user type/behavior. While you may be able to address some of the issues by increasing the lazy flush interval or ruling out the Java BHO, I would start by addressing the problematic disk subsystem.