Nginx – 502 bad gateway errors after 68 concurrent users to website

dockernginxruby-on-railsUbuntuunicorn

I'm running into issues while doing some stress testing in jMeter. Essentially, we're hitting a hard limit of 68 concurrent users. As soon as the test ramps up to that number of users, we're getting 502 bad gateway errors.

The thing that's interesting is that we're getting the very same behavior of failures at 68 users on a VM with double the CPU and RAM. So that leads me to believe this is a configuration issue. After all, configurations are identical between our docker containers on each server.

I've tried raising the worker_connections setting in nginx.conf but that has no effect. I even restarted the machine to make sure the new setting was being applied.

Are there any other ideas for what to look into or try?

I'm not sure if this helps but here's our configuration on the nginx server that's failing…

upstream unicorn_server {
  server unix:/app/tmp/unicorn.sock fail_timeout=0;
  keepalive 512;
}

server {
  listen 4043 ssl;

  ssl_certificate /etc/nginx/certs/hive.crt;
  ssl_certificate_key /etc/nginx/certs/hive.key;

  gzip            on;
  gzip_min_length 1000;
  gzip_proxied    expired no-cache no-store private auth;
  gzip_types      application/json;

  root /app/public;
  try_files $uri @unicorn_server;

  keepalive_timeout 10;

  location @unicorn_server {
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header Host $http_host;
    proxy_set_header X-Forwarded-Proto https; # if use ssl
    proxy_redirect off;
    proxy_pass http://unicorn_server;
    proxy_http_version 1.1;
  }

  location ~ ^/(assets|images|javascripts|stylesheets|swfs|system)/ {
    gzip_static on;
    expires max;
    add_header Cache-Control public;
    add_header Last-Modified "";
    add_header ETag "";

    open_file_cache max=1000 inactive=500s;
    open_file_cache_valid 600s;
    open_file_cache_errors on;
    break;
  }
}

Best Answer

This may not be a site issue. This can happen to due problems between your load generator and the target. Can you tell us more about your test infrastructure? Where are the load generators located relative to the application/server under test? Do you need to traverse proxies for your communication? How may hops are you traversing which can be limiting on your request?