Scala – Play Framework: What happens when requests exceeds the available threads

multithreadingplayframework-2.0scalathreadpool

I have one thread in the thread-pool servicing blocking request.

  def sync = Action {
    import Contexts.blockingPool
    Future { 
        Thread.sleep(100)
    } 
    Ok("Done")
  }

In Contexts.blockingPool is configured as:

custom-pool {
    fork-join-executor {
            parallelism-min = 1
            parallelism-max = 1
    }
}

In theory, if above request receives 100 simultaneous requests, the expected behaviour should be: 1 request should sleep(100) and rest of 99 requests should be rejected (or queued until timeout?). However I observed that extra worker threads are created to service rest of requests. I also observed that latency increases as (gets slower to service request) as number of threads in the pool gets smaller than the requests received.

What is expected behavior if a request larger than configured thread-pool size is received?

Best Answer

Your test is not correctly structured to test your hypothesis. If you go over this section in the docs you will see that Play has a few thread pools/execution contexts. The one that is important with regards to your question is the default thread pool and how that relates to the HTTP requests served by your action.

As the doc describes, the default thread pool is where all application code is run by default. I.e. all action code, including all Future's (not explicitly defining their own execution context), will run in this execution context/thread pool. So using your example:

def sync = Action {

  // *** import Contexts.blockingPool
  // *** Future { 
  // *** Thread.sleep(100)
  // ***} 

  Ok("Done")
}

All the code in your action not commented by // *** will run in the default thread pool. I.e. When a request gets routed to your action:

the Future with the Thread.sleep will be dispatched to your custom execution context
then without waiting for that Future to complete (because it's running in it's own thread pool [Context.blockingPool] and therefore not blocking any threads on the default thread pool)
your Ok("Done") statement is evaluated and the client receives the response
Approx. 100 milliseconds after the response has been received, your Future completes

So to explain you observation, when you send 100 simultaneous requests, Play will gladly accept those requests, route to your controller action (executing on the default thread pool), dispatch to your Future and then respond to the client.

The default size of the default pool is

play {
  akka {
    ...
    actor {
      default-dispatcher = {
        fork-join-executor {
          parallelism-factor = 1.0
          parallelism-max = 24
        }
      }
    }
  }
}

to use 1 thread per core up to a max of 24. Given that your action does very little (excl. the Future), you will be able to handle into the 1000's of requests/sec without a sweat. Your Future will however take much longer to work through the backlog because you are blocking the only thread in your custom pool (blockingPool).

If you use my slightly adjusted version of your action, you will see what confirms the above explanation in the log output:

object Threading {

  def sync = Action {
    val defaultThreadPool = Thread.currentThread().getName;

    import Contexts.blockingPool
    Future {
      val blockingPool = Thread.currentThread().getName;
      Logger.debug(s"""\t>>> Done on thread: $blockingPool""")
      Thread.sleep(100)
    }

    Logger.debug(s"""Done on thread: $defaultThreadPool""")
    Results.Ok
  }
}

object Contexts {
  implicit val blockingPool: ExecutionContext = Akka.system.dispatchers.lookup("blocking-pool-context")
}

All your requests are swiftly dealt with first and then your Future's complete one by one afterwards.

So in conclusion, if you really want to test how Play will handle many simultaneous requests with only one thread handling requests, then you can use the following config:

play {
  akka {
    akka.loggers = ["akka.event.Logging$DefaultLogger", "akka.event.slf4j.Slf4jLogger"]
    loglevel = WARNING
    actor {
      default-dispatcher = {
        fork-join-executor {
          parallelism-min = 1
          parallelism-max = 1
        }
      }
    }
  }
}

you might also want to add a Thread.sleep to your action like this (to slow the default thread pools lonesome thread down a bit)

    ...
    Thread.sleep(100)
    Logger.debug(s"""<<< Done on thread: $defaultThreadPool""")
    Results.Ok
}

Now you will have 1 thread for requests and 1 thread for your Future's. If you run this with high concurrent connections you will notice that the client blocks while Play handles the requests one by one. Which is what you expected to see...

Related Solutions

Java Thread Pools/Executor Service and wait()s – what happens to the threads & task queue

wait() is a blocking operation:

Causes the current thread to wait until another thread invokes the notify() method or the notifyAll()

This means that the thread in the pool will wait, but from outside it just looks like the current task takes so much time to complete. This also means that if 5 tasks are executed and they all wait(), the Executor cannot handle remaining tasks that, ekhem, wait in the queue.

True, the executor thread itself goes to sleep allowing other threads to switch and consume CPU (so you can have hundreds of threads waiting at the same time and your system is still responsive) but still the thread is "unusable" and blocked.

Another interesting feature is interrupting - if the thread waits for something or sleeps you can interrupt it. Note that both wait() and Thread.sleep() declare InterruptedException. With ExecutorService you can take advantage of this by simply calling: future.cancel() (future is the object you got in return when submit task to ExecutorService).

Finally I think you should redesign your solution. Instead of actively waiting for an external system to finish, provide an API with callbacks:

pool.execute(new Runnable(){
    try{
        doSomethingAndCallMeBackWhenItsDone(new Callback() {
            public void done() {
                doSomethingElse();
            }
        });
    }catch(Exception e){ throw new RunTimeException(e) } 

});

This way the external system's API will simply notify you when the results are ready and you won't have to wait and block ExecutorService. Finally, if doSomethingElse() takes a lot of time, you might even decide to schedule it as well rather than using external third-party I/O thread:

pool.execute(new Runnable(){
    try{
        doSomethingAndCallMeBackWhenItIsDone(new Callback() {
            public void done() {
                pool.submit(new Callbale<Void>() {
                    public Void call() {
                        doSomethingElse();
                    }
                }
            }
        });
    }catch(Exception e){ throw new RunTimeException(e) } 

});

UPDATE: you are asking what to do about timeouts? Here is my idea:

pool.execute(new Runnable(){
    try{
        doSomethingAndCallMeBackWhenItsDone(new Callback() {
            public void done() {
                doSomethingElse();
            }
            public void timeout() {
                //opps!
            }
        });
    }catch(Exception e){ throw new RunTimeException(e) } 

});

I guess you can implement timeout on the third-party side and if timeout occurs there, simply call timeout() method.

ASP.NET IIS – when are requests queued

This article might help to understand the settings a little better.

minFreeThreads: This setting is used by the worker process to queue all the incoming requests if the number of available threads in the thread pool falls below the value for this setting. This setting effectively limits the number of requests that can run concurrently to maxWorkerThreads minFreeThreads. Set minFreeThreads to 88 * # of CPUs. This limits the number of concurrent requests to 12 (assuming maxWorkerThreads is 100).

Edit:

In this SO post, Thomas provides more detail and examples of request handling in the integrated pipeline. Be sure to read the comments on the answer for additional explanations.

A native callback (in webengine.dll) picks up request on CLR worker thread, we compare maxConcurrentRequestsPerCPU * CPUCount to total active requests. If we've exceeded limit, request is inserted in global queue (native code). Otherwise, it will be executed. If it was queued, it will be dequeued when one of the active requests completes.

Best Answer

Related Solutions

Java Thread Pools/Executor Service and wait()s – what happens to the threads & task queue

ASP.NET IIS – when are requests queued

Related Topic