Docker VPN Routing – Internal Traffic to Container Access

dockeropenvpnrouting

I have a VPS running multiple docker containers. I have a nextcloud instance, which get's SSL/TLS termination by an nginx proxy (certificates from Let'sEncrypt). And I have an openvpn container. In it's docker network I also host further services (own bind dns server and a git server), that I can reach trough the VPN.

Now I also want to reach my nextcloud instance through the VPN. Originally I thought this wouldn't be a problem, since the nextcloud instance can be reached through the internet and the VPN also gives connection to the internet. But unfortunately I cannot reach it. If I curl my server (http or https) through the VPN I get "port 80/443: No route to host". Without connecting the VPN the connection works correctly.

If I use traceroute, I can see, that it correctly reaches the public IP of my VPS. So I conclude, that it is a problem with the routing. The traffic targeted to port 80/443 on the public IP of my VPS don't get forwarded/routed to the nginx proxy container (which exposed the mentioned ports).

As I understood, docker uses firewalld/iptables to route the traffic between and to containers. Thus other rules are applied to the VPN traffic, than the traffic comming from the internet. What do I need to configure how, so that the VPN traffic (server internal) to my public IP address is forwarded correctly to the corresponding container? I would like to maintain constant/unchanged connectivity between VPN & No-VPN states, so that my Nextcloud app does not get confused.

What I have tried:
I tried out possibilities of a workaround. I could add an own DNS entry for my nextcloud instance in my VPN DNS server, which points to the IP of the nextcloud app container (where I would loose SSL/TLS termination) or of the nginx proxy. In the last case the nginx proxy does not forward the traffic to the nextcloud container, since it uses a different hostname. I want to leave the proxy configuration unchanged, if possible, since it is automatically filled at container startup/from the letsencrypt companion container. Also the certificates would not match up with the used FQDN. If I try to add a master zone with my real/public DNS name (so that I can use the same FQDN as from the outside), all other domains from that TLD don't get forwarded anymore (Is there a possibility to configure bind for that?).

TL;DR: Traffic from a docker container to the public VPS IP address does not get forwarded to the correct docker container, like traffic from outside does.

If you also need more information about the used containers, I will add links and my docker-compose files.

EDIT:

[root@XXXXXXXX ~]# iptables -S FORWARD
-P FORWARD ACCEPT
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A FORWARD -o br-7e5cecc96f4a -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o br-7e5cecc96f4a -j DOCKER
-A FORWARD -i br-7e5cecc96f4a ! -o br-7e5cecc96f4a -j ACCEPT
-A FORWARD -i br-7e5cecc96f4a -o br-7e5cecc96f4a -j ACCEPT
-A FORWARD -o br-fd56ce52983e -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o br-fd56ce52983e -j DOCKER
-A FORWARD -i br-fd56ce52983e ! -o br-fd56ce52983e -j ACCEPT
-A FORWARD -i br-fd56ce52983e -o br-fd56ce52983e -j ACCEPT
-A FORWARD -o br-f1ef60d84b48 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o br-f1ef60d84b48 -j DOCKER
-A FORWARD -i br-f1ef60d84b48 ! -o br-f1ef60d84b48 -j ACCEPT
-A FORWARD -i br-f1ef60d84b48 -o br-f1ef60d84b48 -j ACCEPT
-A FORWARD -o br-b396aa5a2d35 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o br-b396aa5a2d35 -j DOCKER
-A FORWARD -i br-b396aa5a2d35 ! -o br-b396aa5a2d35 -j ACCEPT
-A FORWARD -i br-b396aa5a2d35 -o br-b396aa5a2d35 -j ACCEPT
-A FORWARD -o br-83ac9a15401e -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o br-83ac9a15401e -j DOCKER
-A FORWARD -i br-83ac9a15401e ! -o br-83ac9a15401e -j ACCEPT
-A FORWARD -i br-83ac9a15401e -o br-83ac9a15401e -j ACCEPT
-A FORWARD -d 192.168.122.0/24 -o virbr0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -s 192.168.122.0/24 -i virbr0 -j ACCEPT
-A FORWARD -i virbr0 -o virbr0 -j ACCEPT
-A FORWARD -o virbr0 -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -i virbr0 -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i lo -j ACCEPT
-A FORWARD -j FORWARD_direct
-A FORWARD -j FORWARD_IN_ZONES_SOURCE
-A FORWARD -j FORWARD_IN_ZONES
-A FORWARD -j FORWARD_OUT_ZONES_SOURCE
-A FORWARD -j FORWARD_OUT_ZONES
-A FORWARD -m conntrack --ctstate INVALID -j DROP
-A FORWARD -j REJECT --reject-with icmp-host-prohibited

Best Answer

Docker by default does not allow traffic between any two of its containers that are connected to different bridges. And also it does not allow traffic from a container to a port that has been mapped to the outside by docker itself. This is all implemented with iptables.

First off, the mapping of a port to the outside also happens with iptables. It uses a DNAT rule in the nat table. For these rules Docker creates a separate DOCKER chain, so that the same rules apply from PREROUTING or OUTPUT in the nat table. The DNAT rules are preceded with RETURN jumps that filter out all traffic coming from a Docker bridge. So that is the first hurdle.

It looks a bit like this:

-A DOCKER -i br-one -j RETURN
-A DOCKER -i br-two -j RETURN
-A DOCKER ! -i br-one -p tcp -m tcp --dport EXPOSEDPORT -j DNAT --to-destination 172.17.0.2:INTERNALPORT

The DNAT rule can also have a -d address if you exposed the port to that local address only. No traffic from any Docker bridge can hit the DNAT rule(s) because of the RETURN rules before that. And also on top of that, the DNAT rule does not allow a DNAT back through the same bridge the traffic came from. Which wouldn't be necessary anyway, because from the same bridge you can just reach the INTERNALPORT already.

The restriction on traffic between containers on different bridges is implemented in the filter table of iptables. Two custom chains are at the beginning of the FORWARD chain, and the default policy of that chain is DROP. One is for containers with user-defined bridges, the other for containers with Docker bridges: DOCKER-ISOLATION-STAGE-1. That chain again uses DOCKER-ISOLATION-STAGE-2. The combination of both says basically that if traffic leaves a Docker bridge and then enters another Docker bridge, then DROP it (without ICMP signaling, so the connection just hangs.....)

It looks like this:

-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A DOCKER-ISOLATION-STAGE-1 -i br-one ! -o br-one -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-two ! -o br-two -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-2 -o br-one -j DROP
-A DOCKER-ISOLATION-STAGE-2 -o br-two -j DROP

So, if you want traffic from bridge one, to hit a DNAT for a port exposed on the outside by a container on bridge two and you want the traffic to return for a full connection, then you have to do a couple of things:

  • Drop the RETURN rules that stops the traffic from DNAT in the DOCKER chain in the nat table. You HAVE to remove the RETURN for the source bridge. You CAN leave the RETURN for the destination bridge, if you don't want to allow a container from that bridge to access a DNAT exposed port.

    • iptables -t nat -D DOCKER -i br-one -j RETURN
    • iptables -t nat -D DOCKER -i br-two -j RETURN #Optional if br-one -> br-two
  • Remove the DROP rules for both bridges from the DOCKER-ISOLATION-STAGE-2 chain in the filter table.

    • iptables -t filter -D DOCKER-ISOLATION-STAGE-2 -o br-one -j DROP
    • iptables -t filter -D DOCKER-ISOLATION-STAGE-2 -o br-two -j DROP

Now the lines are open.

Docker does not often refresh its rules (at least not in the 19.03 version I tested with). It seems it only rebuilds the rule sets when the docker daemon restarts, not when you stop or start or create a container. You could try to tack any changes to the service restart to keep them persistent.