Optimal value for Phusion passenger PassengerMaxRequestQueueSize

concurrencypassengersettings

I know this depends on the box hardware, but for example if there are set 100 processes, the default queue is also 100. Does it makes sense to increase PassengerMaxRequestQueueSize to 200 or 300? Probably this depends on free memory. Thoughts?

The best answer will be explaining the setting and probably one or two examples, assuming the server process requests for 2-3 seconds.

Thanks in advance!

Best Answer

Why you should limit queuing

Any requests that aren't immediately handled by an application process, are queued. Queuing is usually is bad: it often means that your server cannot handle the requests quickly enough.

A larger queue means that requests are less likely to be dropped. But this comes with a drawback: during busy times, the larger the queue, the longer your visitors have to wait before they see a response. This causes them to click reload, making the queue even longer (their previous request will stay in the queue; the OS does not know that they've disconnected until it tries to send data back to the visitor), or causes them to leave in frustration.

So having a limit on the queue is a good thing. It limits the impact of the above situation.

You should ensure that requests are queued as little as possible. That could mean:

Making your app faster (if your workload is CPU bound).
Upgrading to faster hardware (if your workload is CPU bound).
Increasing your app's concurrency settings (if your workload is I/O bound), e.g. by increasing the number of processes or threads.

If you cannot prevent requests from being queued, then the next best thing to do is to keep the queue short, and to display a friendly error message upon reaching the queue limit. Something like, "We're sorry, a lot of people are visiting us right now. Please try again later." The documentation for PassengerMaxRequestQueueSize tells you how to do that.

Optimal value for the queue size

It's hard to say what the optimal queue size should be. A good rule of thumb is: set the request queue size to the maximum number of requests you can handle in one second. Depending on your situation you may have to tweak things a little bit.

This rule of thumb comes from the notion of expected burst traffic. How many simultaneous requests do you expect on your server?

Suppose that your queue size is 100, and that for whatever reason you receive 150 requests at the same time. Suppose that your server is fast enough to handle 150 requests in half a second, so you know it's not a performance problem. But if you have a request queue size of 100, then 50 of those requests will be dropped with a "Request queue full" error.

In such a situation, you should set the queue size to the maximum number of concurrent requests that you think you can safely handle without performance issues.

Related Solutions

A race condition

A race condition occurs when two or more threads can access shared data and they try to change it at the same time. Because the thread scheduling algorithm can swap between threads at any time, you don't know the order in which the threads will attempt to access the shared data. Therefore, the result of the change in data is dependent on the thread scheduling algorithm, i.e. both threads are "racing" to access/change the data.

Problems often occur when one thread does a "check-then-act" (e.g. "check" if the value is X, then "act" to do something that depends on the value being X) and another thread does something to the value in between the "check" and the "act". E.g:

if (x == 5) // The "Check"
{
   y = x * 2; // The "Act"

   // If another thread changed x in between "if (x == 5)" and "y = x * 2" above,
   // y will not be equal to 10.
}

The point being, y could be 10, or it could be anything, depending on whether another thread changed x in between the check and act. You have no real way of knowing.

In order to prevent race conditions from occurring, you would typically put a lock around the shared data to ensure only one thread can access the data at a time. This would mean something like this:

// Obtain lock for x
if (x == 5)
{
   y = x * 2; // Now, nothing can change x until the lock is released. 
              // Therefore y = 10
}
// release lock for x

A mutex

When I am having a big heated discussion at work, I use a rubber chicken which I keep in my desk for just such occasions. The person holding the chicken is the only person who is allowed to talk. If you don't hold the chicken you cannot speak. You can only indicate that you want the chicken and wait until you get it before you speak. Once you have finished speaking, you can hand the chicken back to the moderator who will hand it to the next person to speak. This ensures that people do not speak over each other, and also have their own space to talk.

Replace Chicken with Mutex and person with thread and you basically have the concept of a mutex.

Of course, there is no such thing as a rubber mutex. Only rubber chicken. My cats once had a rubber mouse, but they ate it.

Of course, before you use the rubber chicken, you need to ask yourself whether you actually need 5 people in one room and would it not just be easier with one person in the room on their own doing all the work. Actually, this is just extending the analogy, but you get the idea.