Linux – TCP connection through IPSec (Linux/Strongswan) stalls after exceeding PMTU

ipseclinuxnetworkingstrongswanvpn

The backups (via Bacula) of one of my servers (“A”) connected via IPSec (Strongswan on Debian testing) to a storage daemon (“B”) don't finish 95% of the times they run.
What apparently happens, is:

  1. Bacula opens a TCP connection to the storage daemon's VPN IP. (A → B)
  2. Since the kernel setting net.ipv4.ip_no_pmtu_disc=0 is set by default, the IP Don't Fragment bit is set in the plaintext packet.
  3. When routing the packet into the IPSec tunnel, the DF bit of the payload is copied to the IP header of the ESP packet.
  4. After some time (often around 20 mins) and up to several gigabyte of data sent, a packet slightly larger than ESP packets before is sent. (A → B)
  5. As the storage daemon interface has a lower MTU than the one of the sending host, a router along the way sends an ICMP type 3, code 4 (Fragmentation Needed and Don't Fragment was Set) error to the host. (some router → A)
  6. Connection stalls, for some reason host A floods ~100 empty duplicate ACKs to B (within ~20 ms).

(The ICMP packets are reaching host A and there are no iptables rules in place that block ICMP.)

Possible reasons why this happens, that I can think of:

  • Kernel bug (Debian 3.13.7-1)
  • Linux' IPSec implementation intentionally ignores the PMTU message as a security measure since it is unprotected and would affect an existing SA. (seems to be valid behavior according to RFC 4301 8.2.1)
  • Has to do something with PMTU Aging (RFC 4301 8.2.2)

What is the best way to fix this, without disabling PMTU discovery globally or lowering the interface MTU? Maybe clear the DF bit somehow like FreeBSD does with ipsec.dfbit=0?

Best Answer

You could try creating a rule in iptables to set the TCP MSS for the VPN-destined traffic to a lower value. But without a packet capture it's difficult to guess what's going on.