Load testing HTTP server with large number of concurrent connections

benchmarkinghttpload-testing

I'm trying to load test/benchmark a http server with a very large number of simultaneous connections (10-100k). What is a good procedure for doing this? On linux I've seen that both the client and host likely need to have the number of permitted threads to be explicitly increased.

Also, does anyone have a good feel for how many client machines are needed for testing 10k and up connections? Is one machine enough, or does there tend to be a cap on the number of sockets a machine can handle?

I'm currently using nperf to generate loads. I've been successful up to around 1-2k concurrent requests, after which some of the requests come back failed. I'm not sure if the failures are of the server, or the client and I'm reluctant (lazy) to provision more machines for use as clients if the issue can be resolved with the one I have.

Best Answer

Without any more details about your set-up, I'm tempted to say that you need to try it out.

The best way to benchmark your load generators is to try an easy load on a very light test page on your server (something that is going to cause the least amount of load on the http server). This will test your set-up's ability to generate the load so you can decide how many generators you need. You'll need to monitor your generators particularly for CPU, network, and the timeliness of each request (i.e. are they all sent out to schedule or do they build-up and lag behind?). Check the error logs on your generators, check for dropped requests.

Then you can move on to your real test. Now for the real test you may need to adjust your load generators set-up, depending on how complex your test case is, but the benchmark above will give you a good starting point.

Related Solutions

C# Load Testing – Generating Per Second Requests

I don't have all the answers. Hopefully I can shed some light on it.

To simplify my previous statements about .NET's threading models, just know that Parallel Library uses Tasks, and the default TaskScheduler for Tasks, uses the ThreadPool. The higher you go in the hierarchy (ThreadPool is at the bottom), the more overhead you have when creating the items. That extra overhead certainly doesn't mean it's slower, but it's good to know that it's there. Ultimately the performance of your algorithm in a multi-threaded environment comes down to its design. What performs well sequentially may not perform as well in parallel. There are too many factors involved to give you hard and fast rules, they change depending on what you're trying to do. Since you're dealing with network requests, I'll try and give a small example.

Let me state that I am no expert with sockets, and I know next to nothing about Zeroc-Ice. I do know about bit about asynchronous operations, and this is where it will really help you. If you send a synchronous request via a socket, when you call Socket.Receive(), your thread will block until a request is received. This isn't good. Your thread can't make any more requests since it's blocked. Using Socket.Beginxxxxxx(), the I/O request will be made and put in the IRP queue for the socket, and your thread will keep going. This means, that your thread could actually make thousands of requests in a loop without any blocking at all!

If I'm understanding you correctly, you are using calls via Zeroc-Ice in your testing code, not actually trying to reach an http endpoint. If that's the case, I can admit that I don't know how Zeroc-Ice works. I would, however, suggest following the advice listed here, particularly the part: Consider Asynchronous Method Invocation (AMI). The page shows this:

By using AMI, the client regains the thread of control as soon as the invocation has been sent (or, if it cannot be sent immediately, has been queued), allowing the client to use that thread to perform other useful work in the mean time.

Which seems to be the equivalent of what I described above using .NET sockets. There may be other ways to improve the performance when trying to do a lot of sends, but I would start here or with any other suggestion listed on that page. You've been very vague about the design of your application, so I can be more specific than I have been above. Just remember, do not use more threads than absolutely necessary to get what you need done, otherwise you'll likely find your application running far slower than you want.

Some examples in pseudocode (tried to make it as close to ice as possible without me actually having to learn it):

var iterations = 100000;
for (int i = 0; i < iterations; i++)
{
    // The thread blocks here waiting for the response.
    // That slows down your loop and you're just wasting
    // CPU cycles that could instead be sending/receiving more objects
    MyObjectPrx obj = iceComm.stringToProxy("whateverissupposedtogohere");
    obj.DoStuff();
}

A better way:

public interface MyObjectPrx : Ice.ObjectPrx
{
    Ice.AsyncResult GetObject(int obj, Ice.AsyncCallback cb, object cookie);
    // other functions
}

public static void Finished(Ice.AsyncResult result)
{
    MyObjectPrx obj = (MyObjectPrx)result.GetProxy();
    obj.DoStuff();
}

static void Main(string[] args)
{
    // threaded code...
    var iterations = 100000;
    for (int i = 0; i < iterations; i++)
    {
        int num = //whatever
        MyObjectPrx prx = //whatever
        Ice.AsyncCallback cb = new Ice.AsyncCallback(Finished);
        // This function immediately gets called, and the loop continues
        // it doesn't wait for a response, it just continually sends out socket
        // requests as fast as your CPU can handle them.  The response from the
        // server will be handled in the callback function when the request
        // completes.  Hopefully you can see how this is much faster when 
        // sending sockets.  If your server does not use an Async model 
        // like this, however, it's quite possible that your server won't 
        // be able to handle the requests
        prx.GetObject(num, cb, null);
    }
}

Keep in mind that more threads != better performance when trying to send sockets (or really doing anything). Threads are not magic in that they will automatically solve whatever problem you're working on. Ideally, you want 1 thread per core, unless a thread is spending much of its time waiting, then you can justify having more. Running each request in its own thread is a bad idea, since context switches will occur and resource waste. (If you want to see everything I wrote about that, click edit and look at the past revisions of this post. I removed it since it only seemed to cloud the main issue at hand.)

You can definitely make these request in threads, if you want to make a large number of requests per second. However, don't go overboard with the thread creation. Find a balance and stick with it. You'll get better performance if you use an asynchronous model vs a synchronous one.

I hope that helps.

HTTP Server – How to Write an HTTP Server

Use the RFC2616, Luke!

You read the RFC 2616 on HTTP/1.1, and you go for it.

That was actually a project in my 3rd year in engineering school, and that's pretty much the project description.

Tools

Your tools are:

basic networking stuff (socket management, binding, understand addresses),
good understanding of I/O streams,
a lot patience to get some shady parts of the RFC (mime-types are fun).

Fun Considerations

Things to consider for extra fun:

plug-in architecture to add CGI / mod support,
configuration files for, well, many things,
lots of experimentation on how to optimize transfers,
lots of experimentation to see how to manage load in terms of CPU and memory, and to pick a dispatch model (big fat even loop, single accept dispatch, multi-thread, multi-process, etc...).

Have fun. It's a very cool thing to look at.

Other (Simpler) Suggestions

FTP client/server (mostly RFC959 but there are older versions and also some extensions)
IRC client/server (mostly RFC1459, but there are extensions)

They're way easier to tackle first, and their RFCs are a lot easier to digest (well, the IRC one has some odd parts, but the FTP one is pretty clear).

Language Choice

Of course, some implementation details will be highly dependant on the language and stack you use to implement it. I approached all that in C, but I'm sure it can be fun just as well in other languages (ok, maybe not as much fun, but still fun).