I'm running docker on AWS EC2 instances, and I'd like to block certain containers from accessing
the EC2 instance metadata (at IP address 169.254.169.254). I thought I could do this by running
those containers as a specific user (eg userx), in the presence of the following ip tables rule:
$ iptables -A OUTPUT -m owner --uid-owner userx -d 169.254.169.254 -j DROP
This blocks the connnection as expected when the container is run with host networking:
$ docker run -it --rm --network host -u $(id -u userx):$(id -g userx) appropriate/curl http://169.254.169.254/latest/meta-data/
...blocks..
But sadly allows the connection when the container runs within it's own network
$ docker run -it --rm -u $(id -u userx):$(id -g userx) appropriate/curl http://169.254.169.254/latest/meta-data/
...show metadata...
How can I make this work? Or alternatively, is there some other technique that will give specific containers full network access whilst blocking the instance metadata?
Best Answer
Your issue is that
OUTPUT
doesn't catch packets coming out of containers.FORWARD
does.Why is that?
Every Docker container runs in its own network namespace. Every network namespace has its own routing table and iptables rules, and behaves exactly as if it was a separate physical machine.
In iptables:
INPUT
matches packets going to local processesFORWARD
matches packets coming in one network interface and going out another one (being routed through).OUTPUT
matches packets coming from local processesThe key is that "local process" means "a process in this network namespace", not "a process in this machine".
Let's analyze what's going on:
OUTPUT
chain in the container's network namespace iptables. (which is empty!)veth
interface.veth
interface.eth0
.FORWARD
chain in the host's network namespace.eth0
.Therefore, the solution is putting your rule in the
FORWARD
chain instead.The issue is that
-m owner
doesn't work inFORWARD
. According toman iptables-extensions
:You can either hardcode the container's IP addresses, or put containers you want filtered in a special network, and match the whole range. Something similar to this should work:
Also, using
owner
is probably not a good idea either way, because processes inside docker containers can change their uid's via eg setuid binaries (like sudo), if there are any in the image.