Nginx – SYN flooding on port 443 while nginx reload

nginxsysctltcptcpip

While reloading nginx, I started getting errors in messages log "possible SYN flooding on port 443", and it seems like nginx becomes completely irresponsive at that time (quite for a while), cause zabbix reports "nginx is down" with ping 0s. RPS at that time is about 1800.

But, server stays responsive on the other non-web ports (SSH, etc.)

Where should I look into and what configs (sysctl, nginx) should I show to find the root cause of this.

Thanks in advance.

Some additional info:

$ netstat -tpn |awk '/nginx/{print $6,$7}' |sort |uniq -c
   3266 ESTABLISHED 31253/nginx
   3289 ESTABLISHED 31254/nginx
   3265 ESTABLISHED 31255/nginx
   3186 ESTABLISHED 31256/nginx

nginx.conf sample:

worker_processes  4;
timer_resolution 100ms;
worker_priority -15;
worker_rlimit_nofile 200000;

events {
  worker_connections  65536;
  multi_accept on;
  use epoll;
}

http {

  sendfile on;
  tcp_nopush on;
  tcp_nodelay on;

  keepalive_requests 100;
  keepalive_timeout  65;

}

custom sysctl.conf

net.ipv4.ip_local_port_range=1024 65535
net.ipv4.conf.all.accept_redirects=0
net.ipv4.conf.all.secure_redirects=0
net.ipv4.conf.all.send_redirects=0
net.core.netdev_max_backlog=10000
net.ipv4.tcp_syncookies=0
net.ipv4.tcp_max_syn_backlog=20480
net.ipv4.tcp_synack_retries=2
net.ipv4.tcp_syn_retries=2
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.netfilter.nf_conntrack_max=1048576
net.ipv4.tcp_congestion_control=htcp
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_no_metrics_save=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_max_tw_buckets=1400000
net.core.somaxconn=250000
net.ipv4.tcp_keepalive_time=900
net.ipv4.tcp_keepalive_intvl=15
net.ipv4.tcp_keepalive_probes=5
net.ipv4.tcp_fin_timeout=10

UPD

Under normal load at about 1800 RPS, when I set backlog on nginx to 10000 on 80 and 443 ports, and then reloaded nginx it became to use more RAM (3.8Gb out of my 4GB instance were used, and some workers were killed by OOM-killer), and with worker_priority at -15 load was over 6 (while my instance has 4 cores only). So, the instance was quite laggy, and I set worker_priority to -5, and backlog to 1000 for every port. For now, it uses less memory, and peak load was 3.8, but, nginx still becomes unresponsive for a minute or two after reload. So, the problem still persists.

Some netstat details:

netstat -tpn |awk '/:80/||/:443/{print $6}' |sort |uniq -c
      6 CLOSE_WAIT
     14 CLOSING
  17192 ESTABLISHED
    350 FIN_WAIT1
   1040 FIN_WAIT2
    216 LAST_ACK
    338 SYN_RECV
  52541 TIME_WAIT

Best Answer

That message would indicate that your TCP SYN queue is overflowing during the reload - does the reload take a while to complete? I notice that you have set net.core.netdev_max_backlog, net.ipv4.tcp_max_syn_backlog and net.core.somaxconn to high values which is good. You will also need to ensure that you have told your nginx server to use a large SYN backlog listen 443 backlog=10000; http://nginx.org/en/docs/http/ngx_http_core_module.html#listen