Why does bgp OPEN message get Connect Socket: Connection reset by peer when node is on a different subnet/gateway

bare-metalbgpcalicokubernetes

My network setup:

Kubernetes network setup

With this setup, only nodes on same subnet can establish bgp connection. Other nodes (that do a full 3 way tcp handshake), responds to hte OPEN message with [FIN, ACK] then a [RST] hence the Connection reset by peer message in my calicoctl node status <- is on controller 3 (10.0.3.100)

    IPv4 BGP status
+--------------+-------------------+-------+----------+--------------------------------+
| PEER ADDRESS |     PEER TYPE     | STATE |  SINCE   |              INFO              |
+--------------+-------------------+-------+----------+--------------------------------+
| 10.0.1.100   | node-to-node mesh | start | 07:12:01 | Connect Socket: Connection     |
|              |                   |       |          | closed                         |
| 10.0.2.100   | node-to-node mesh | start | 07:12:01 | Connect                        |
| 10.0.1.101   | node-to-node mesh | start | 07:12:01 | Connect Socket: Connection     |
|              |                   |       |          | reset by peer                  |
| 10.0.1.102   | node-to-node mesh | start | 07:12:01 | Connect Socket: Connection     |
|              |                   |       |          | reset by peer                  |
| 10.0.2.102   | node-to-node mesh | start | 07:12:01 | Connect Socket: Connection     |
|              |                   |       |          | reset by peer                  |
| 10.0.3.101   | node-to-node mesh | up    | 07:14:13 | Established                    |
| 10.0.3.102   | node-to-node mesh | up    | 07:12:02 | Established                    |
+--------------+-------------------+-------+----------+--------------------------------+

My wireshark dump of the handshake + OPEN message from controller 3 (10.0.3.100) to node4 (10.0.2.102)

Wireshark bgp trace between 10.0.3.100 and 10.0.2.102
Wireshark bgp trace between 10.0.0.4(10.0.3.100) and 10.0.2.102
Maybe the issue is that node 4 sees the data coming from 10.0.0.4 and not 10.0.3.100?

What works

  1. Ping from all nodes to all nodes OK
  2. nc port 179 to all nodes succeeds
  3. Wireshark shows the full TCP handshake from controller 3 to node 4

Setup

  1. Kubernetes 1.21.1 (installed via kubespray)
  2. Calico 3.9 (default in kubespray)
  3. All gateways are pfSense 2.5.x, the "master" gateway has static
    routes for 10.0.1.0/24 via 10.0.0.2, 10.0.2.0/24 via 10.0.0.3 and
    10.0.3.0/24 via 10.0.0.4.
  4. Firewalls are disabled on the datacenter routers both on wan and lan No NAT is enabled on any of the pfSense boxes. (NAT for ipsec
    vpn is on wan port for master gateway)
  5. As far as I can tell i have full IP connectivity between all nodes in all subnets

Best Answer

I wrongly assumed pfSense Auto NAT was only for IPsec passtrough, when I disabled all outbound NAT rule generation it started working as intended. My fault for not understanding the setting in my pfSense routers.