Iis – Why do we get a sudden spike in response times

iismulti-threadingperformance

We have an API that is implemented using ServiceStack which is hosted in IIS. While performing load testing of the API we discovered that the response times are good but that they deteriorate rapidly as soon as we hit about 3,500 concurrent users per server. We have two servers and when hitting them with 7,000 users the average response times sit below 500ms for all endpoints. The boxes are behind a load balancer so we get 3,500 concurrents per server. However as soon as we increase the number of total concurrent users we see a significant increase in response times. Increasing the concurrent users to 5,000 per server gives us an average response time per endpoint of around 7 seconds.

The memory and CPU on the servers are quite low, both while the response times are good and when after they deteriorate. At peak with 10,000 concurrent users the CPU averages just below 50% and the RAM sits around 3-4 GB out of 16. This leaves us thinking that we are hitting some kind of limit somewhere. The below screenshot shows some key counters in perfmon during a load test with a total of 10,000 concurrent users. The highlighted counter is requests/second. To the right of the screenshot you can see the requests per second graph becoming really erratic. This is the main indicator for slow response times. As soon as we see this pattern we notice slow response times in the load test.

perfmon screenshot with requests per second highlighted

How do we go about troubleshooting this performance issue? We are trying to identify if this is a coding issue or a configuration issue. Are there any settings in web.config or IIS that could explain this behaviour? The application pool is running .NET v4.0 and the IIS version is 7.5. The only change we have made from the default settings is to update the application pool Queue Length value from 1,000 to 5,000. We have also added the following config settings to the Aspnet.config file:

<system.web>
    <applicationPool 
        maxConcurrentRequestsPerCPU="5000"
        maxConcurrentThreadsPerCPU="0" 
        requestQueueLimit="5000" />
</system.web>

More details:

The purpose of the API is to combine data from various external sources and return as JSON. It is currently using an InMemory cache implementation to cache individual external calls at the data layer. The first request to a resource will fetch all data required and any subsequent requests for the same resource will get results from the cache. We have a 'cache runner' that is implemented as a background process that updates the information in the cache at certain set intervals. We have added locking around the code that fetches data from the external resources. We have also implemented the services to fetch the data from the external sources in an asynchronous fashion so that the endpoint should only be as slow as the slowest external call (unless we have data in the cache of course). This is done using the System.Threading.Tasks.Task class. Could we be hitting a limitation in terms of number of threads available to the process?

Best Answer

Following with @DavidSchwartz and @Matt this looks like a threads, locks managing issue.

I suggest:

Freeze the external calls and the cache generated for them and run the load test with static external information just to discard any issue not related with server - environment side.
Use thread pools if not using them.
About external calls you said "We have also implemented the services to fetch the data from the external sources in an asynchronous fashion so that the endpoint should only be as slow as the slowest external call (unless we have data in the cache of course)."

Questions are: - Have you checked if any cache data is locked during the external call or only when writing the external call result into the cache? (too obvious but must say). - Do you lock the whole cache or smalls parts of it? (too obvious but must say). - Even if they are asynchronous, how often do external calls run? Even if they don't run so often, they could be blocked by excessive amount of requests to the cache from the user calls while the cache is locked. This scenario usually shows fixed percentage of CPU used because many threads are waiting in fixed intervals and the "locking" must also be managed. - Have you checked if external tasks mean response time also increases when the slow scenario arrives?

If the problem still persists, I'd suggest avoiding the Task class and make the external calls through the same thread pool that manages the user requests. This is to avoid the previous scenario.

Numbers Everyone Should Know

L1 cache reference                             0.5 ns
Branch mispredict                              5 ns
L2 cache reference                             7 ns
Mutex lock/unlock                            100 ns (25)
Main memory reference                        100 ns
Compress 1K bytes with Zippy              10,000 ns (3,000)
Send 2K bytes over 1 Gbps network         20,000 ns
Read 1 MB sequentially from memory       250,000 ns
Round trip within same datacenter        500,000 ns
Disk seek                             10,000,000 ns
Read 1 MB sequentially from network   10,000,000 ns
Read 1 MB sequentially from disk      30,000,000 ns (20,000,000)
Send packet CA->Netherlands->CA      150,000,000 ns

It's from his presentation titled Designs, Lessons and Advice from Building Large Distributed Systems and you can get it here:

Dr Jeff Dean Keynote PDF or on slideshare.net

The talk was given at Large-Scale Distributed Systems and Middleware (LADIS) 2009.

Other Info

It's said that gcc -O4 emails your code to Jeff Dean for a rewrite.

Iis – How to determine performance improvement from precompiling ASP.NET site

Difference between "pre-compiled" and "not pre-compiled", is that the "not precompiled"'s site pages will get dynamically compiled upon first request to each of those pages by .net compiler (csc.exe/vbc.exe, you can actually see them popup in task manager's processes tab). So each page will take a one time hit of compilation time, though usually its negligible. If your website also has code files in /app_code directory, those will get compiled before web sites starts up as well, so initial start up should be slightly slower than "pre-compiled" version. That is if your "non pre-compiled" site's compilation element in web.config has "batch" attribute set to false, otherwise it'll spend time compiling all the pages right at startup, which can take too long depending on size of your site. compilation Element (ASP.NET Settings Schema)

After /app_code files and, for example, default.aspx get compiled in "not pre-compiled" site though, there will be no difference in actual performance between the two.

IIS reset or an app pool recycle will not show any differences either, since after deploying one and running both, both sites are compiled. IIS Reset/app pool reset will not cause a recompilation of the "not pre-compiled" site, only changing files/redeploying will.

Take a look Understanding ASP.NET Dynamic Compilation, it's important to understand what that does to compare the two.

Best Answer

Related Solutions

Performance IO – Are Networks Now Faster Than Disks?

Numbers Everyone Should Know

Other Info

Iis – How to determine performance improvement from precompiling ASP.NET site

Related Topic