How to debug frequent “connection reset by peer”

networkingtcp

Recently I started to have very frequent "connection reset by peer" on calls to an external provider. My application (client) is a Go application, doing some simple POST to an external provider over HTTPS

Some context:

  • Go client application is running on docker.
  • The "connection reset by peer" is frequent, but erratic.
  • Provider says nothing is wrong on their end. Ok, RST
    can come from anywhere in between us.

The host instance ifconfig:

docker0   Link encap:Ethernet  HWaddr [REDACTED]
          inet addr:172.17.0.1  Bcast:172.17.255.255  Mask:255.255.0.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

eth0      Link encap:Ethernet  HWaddr [REDACTED]
          inet addr:10.208.19.134  Bcast:10.208.19.255  Mask:255.255.255.128
          inet6 addr: fe80::8d:fdff:fe90:f410/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:37685240 errors:0 dropped:0 overruns:0 frame:0
          TX packets:37927624 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:13408927179 (12.4 GiB)  TX bytes:14057395581 (13.0 GiB)

I tried:

  • Ran tcpdump -vv -i eth0 -s 65535 -n dst host [[PROVIDER IP]] -w capture.cap & on host instance (EC2)
  • Opened capture with Wireshark an looked for tcp.flags.reset==1

Couldn't find anything. And am sure there were connection reset by peer during the capture (as we have logging in place). All I wanted to understand is where the RST is coming from (if that is possible).

So, what options do I have to look for the root cause of all these sudden errors?

Best Answer

To prove that the packets originate outside, try adding this to your router:

iptables -I FORWARD -i eth0 -p tcp --tcp-flags RST RST -j DROP

Be aware that RST packets are needed in normal operation and this could quickly hog your resources, so limit it to a particular --sport, even for testing, unless the router is used very lightly and only leave it in place for this test.

Replace FORWARD with INPUT if you are testing and filtering on a single machine. Replace eth0 with your actual internet connection.

For rudimentary confirmation, you can issue iptables -nvL | grep spt and see how the number at the start increased.