Linux – High TCP reset and packet drop count on CentOS Linux

apache-2.2centoslinuxpacketlossreset

I have a small farm of web servers (HP Proliant and IBM x, with Broadcom Corporation NetXtreme II BCM5 NIC's) running Apache 2.2.15 on CentOS 6, behind a Cisco ACE load balancer, serving a PHP/JS based web portal. This farm receives a lot of requests daily (it serves a whole small country) trying to access a splash page (to go, from there, to the index page)

I've been struggling with the following problem:

  • I've noticed sometimes requests to web delay quite a "long" time to be answered (from the client point of view) and sometimes they are not even answered at all (timeout at web client side). In the latter, I don't even seen the request on Apache logs.

  • I've also noticed that netstat reports an increasing amount of TCP resets being sent (netstat -st | grep 'resets sent')

  • Also, dropwatch -l kas shows there are many packets being dropped:

Initalizing kallsyms db dropwatch> start Enabling monitoring… Kernel
monitoring activated. Issue Ctrl-C to stop monitoring 53 drops at
tcp_v4_md5_hash_skb+248 (0xffffffff8149fa08) 26 drops at
tcp_rcv_established+926 (0xffffffff814981b6) 3 drops at
tcp_v4_reqsk_destructor+fa (0xffffffff814a104a) 1 drops at
netlink_unicast+251 (0xffffffff81471b11) 56 drops at
tcp_v4_md5_hash_skb+248 (0xffffffff8149fa08) 29 drops at
tcp_rcv_established+926 (0xffffffff814981b6) 4 drops at
tcp_v4_reqsk_destructor+fa (0xffffffff814a104a) 51 drops at
tcp_v4_md5_hash_skb+248 (0xffffffff8149fa08) 32 drops at
tcp_rcv_established+926 (0xffffffff814981b6) 2 drops at
tcp_v4_reqsk_destructor+fa (0xffffffff814a104a) 1 drops at
ip_rcv_finish+199 (0xffffffff8147ea49) 1 drops at
tcp_v4_destroy_sock+115 (0xffffffff814a0cf5) 1 drops at
tcp_v4_reqsk_destructor+fa (0xffffffff814a104a) 22 drops at
tcp_rcv_established+926 (0xffffffff814981b6) 36 drops at
tcp_v4_md5_hash_skb+248 (0xffffffff8149fa08) 2 drops at
tcp_v4_reqsk_destructor+fa (0xffffffff814a104a) 49 drops at
tcp_v4_md5_hash_skb+248 (0xffffffff8149fa08) 29 drops at
tcp_rcv_established+926 (0xffffffff814981b6) 26 drops at
tcp_rcv_established+926 (0xffffffff814981b6)

I've been following recommendations from RH (Red Hat Enterprise Linux Network Performance Tuning
Guide
), even though I've not seen some of the symptoms described there in my servers. In short:

  • I've increased the NIC ring buffers to maximum.
  • I've fiddled with (increased or changed) several kernel parameters (tcp_syncookies, netdev_budget, tcp_timestamps, tcp_window_scaling, tcp_rmem, dev_weight, tcp_tw_reuse…)
  • I've modified the Apache config according to several "Apache
    optimization guides" extracted from web (even tough there were, and still are, Idle workers on Apache stats)
  • I've stop/disabled any system service/daemon not required (basically
    all that remains is sshd, httpd and snmpd)

All of the above with no luck.

All NIC's at working at Speed: 1000Mb/s, CPU and disk usage are low, and neither netstat nor ethtool shows errors.

Any ideas what else can be done?

Best Answer

A TCP reset is an immediate close of a TCP connection. This allows for the resources that were allocated for the previous connection to be released and made available to the system.

causes of RST generation

Ack, Reset

  1. sent in response to a Syn. An Ack Reset sent in response to a Syn frame is sent to acknowledge the receipt of the frame but then to let the client know that the server cannot allow the connection on that port. Among the reasons for the Ack, Reset are:

    a. The node being connected to is not listening on the port the client node is trying to connect to.

    b. There is some reason that the server node cannot complete the connection on that port. For example, the server is out of resources and so cannot allocate the needed resources to allow the connection.

RST

  1. If the connection is in any non-synchronized state (LISTEN, SYN-SENT, SYN-RECEIVED), and the incoming segment acknowledges something not yet sent (the segment carries an unacceptable ACK) , a reset is sent.

  2. The next reset is a TCP reset that happens when a network frame is sent six times (this would be the original frame plus five retransmits of the frame) without a response. As a result, the sending node resets the connection.

As you and tried using various kernal tuning parameters , Try using tcp cookies option of kernel

Enable TCP SYN cookie protection

Edit the file /etc/sysctl.conf, run:
# vi /etc/sysctl.conf

Append the following entry:

net.ipv4.tcp_syncookies = 1

Save and close the file. To reload the change, type:
# sysctl -p 

solution can be given only by analyzing your logs , IPtables can also help