Using HAProxy to balance slow and fast requests across multiple backends

I'm new to using HAProxy to load balance my app, which currently runs five backend application instances. My setup is relatively straightforward. I have a Ruby app that uses fibers, EventMachine, and thin to try and keep it as non-blocking as possible. So most requests will return quickly, or at the very least they won't block the application server, so it can serve multiple requests at once.

However, there are some requests (such as image uploads which need to access shell commands) which can be slow and block the application server.

What I'm finding is that a straight-forward round-robin style of balancing doesn't make sense here, since requests which could be handled simultaneously and returned quickly get backlogged behind the slow requests. What's the best way to handle this?

I've considered:

Having a health-check which runs frequently (say every 250ms) to check and see if the server is responding. If it's not, assume it's "down" (most likely blocking on a long request), which will cause HAProxy to route requests around it. However in this scenario there is a possibility that all 5 instances could become blocked.
Have a pre-defined list of slow request URLs, designate 2 or 3 of the application backends to only handle slow requests, and route all others to the "fast" backends. In theory fast requests will never get blocked, but this approach seems a bit more brittle since I'll need to make sure that if the URL's ever change, I remember to update my HAProxy config.

I think the latter approach is probably best, but since devops isn't my strong-suit, I thought I would check and see what best practice is in this scenario.

Of course, "best practice" is probably to have all long-running requests moved to background tasks, but in this case let's assume I don't have time for that right now if it's avoidable 🙂

frontend myfrontend bind *:80 acl url-slow path /some-long-running-request use_backend slow-backend if url-slow default_backend regular-backend backend slow-backend ... server backend1 1.2.3.4:80 maxconn 10 backend regular-backend ... server backend1 1.2.3.4.:80 maxconn 90

Best Answer

A little from column A, a little from column B :)

For the sake of uptime, you should definitely be using health checks in HAProxy regardless of anything else. If one of your backend nodes goes down, you want HAProxy to stop sending requests to it. The failure doesn't have to be in your application, could be hardware, network, whatever. This is pretty straightforward to configure:

option httpchk GET /test HTTP/1.0\r\n

250ms sounds like a very frequent check. At that rate, your backend servers could spend a lot of time just processing health checks. You need to trade off the cost of the health checks in terms of application overhead vs how quickly you want dead nodes to be taken offline.

The second strategy is one I've used before. Figure out the total number of concurrent connections your application can handle. Then, in HAProxy, split the requests into slow and fast and allocate a proportion of your total connections to each. E.g.,

So, say, backend1 can handle 100 concurrent connections. In the above, we allocate 90 connections to 'regular' requests and 10 connections to slow running requests. Even if you use up all of your slow connections, then the 'regular' requests will still be processed.

Good luck!

Best Answer

Related Solutions

Load Balancing – Managing Long-Running TCP Connections with HAProxy

HAproxy request redirecting to multiple backends

Related Topic