AWS: Error accessing the Internet with a custom Network ACL

amazon-web-servicesnetworkingrouting

I've been stuck looking at my screen for about 2 hours trying to figure out why this is not working. I'm building the typical VPC with a private and public subnet and trying to lock it down as much as possible. I have a few security groups, but I know the issue is in the NACL, as if I relax the rules, everything works.

My inbound NACL is
Inbound ACL

and the outbound is
enter image description here

The problem I have is that I cannot access the internet (port 80 and 443) from inside any of the EC2 instances in either the public or private subnets. I know the "problem rule" is inbound no 1000, which allows all ephemeral ports from 10.0.0.0/16. If I change this rule to apply to all sources (0.0.0.0/0) I can access the internet from all ec2 instances in both subnets (I'm testing this by running different apps; curl and yum mostly.

I'm just struggling to understand this behaviour, as the inbound rule should allow any ec2 instance to open an ephemeral port and talk to the router, and then the router can talk to port 80 and 443 of any host. I have the feeling I'm missing simple, but crucial point here :).

edit

In ascii art, this is my understanding of the rules

EC2 instance does a curl on www.google.com (port 80) 

SYN Packet out to stablish the connection (ephemeral port on VM to port 80)
EC2 vm (somewhere in 10.0.0.0/16:ephemeral)
 -> SG 
 -> NACL in (in rule 1000 - ALLOW source 10.0.0.0/16 on ports 1024-65535) 
 -> NACL out (out rule 200 - ALLOW destination 0.0.0.0/0 on port 80)
 -> IGW 
 -> Google (172.217.23.14:80)

SYN + ACK Packet in to continue the connection handshake
Google (172.217.23.14:80)
  -> IGW 
  -> NACL in (in rule 100 - ALLOW source 0.0.0.0/0 on port 80) 
  -> NACL out (out rule 100 - ALLOW destination 0.0.0.0/0 on port 1024-65535) 
  -> SG 
  -> EC2 vm (somewhere in 10.0.0.0/16:ephemeral)

Edit 2

Running tcpdump (sudo tcpdump -i eth0 -s 1500 port not 22) to ensure that google is not returning data form an ephemeral port. I've removed the extra data with the packet flags, and you can add -X to the flags to see the actual data in each packet.

18:28:56.919335 IP ip-10-112-7-114.eu-west-1.compute.internal.40174 > prg03s06-in-f4.1e100.net.http:
18:28:56.949105 IP prg03s06-in-f4.1e100.net.http > ip-10-112-7-114.eu-west-1.compute.internal.40174:
18:28:56.949119 IP ip-10-112-7-114.eu-west-1.compute.internal.40174 > prg03s06-in-f4.1e100.net.http:
18:28:56.949219 IP ip-10-112-7-114.eu-west-1.compute.internal.40174 > prg03s06-in-f4.1e100.net.http:
18:28:56.979089 IP prg03s06-in-f4.1e100.net.http > ip-10-112-7-114.eu-west-1.compute.internal.40174:
18:28:57.010155 IP prg03s06-in-f4.1e100.net.http > ip-10-112-7-114.eu-west-1.compute.internal.40174:
18:28:57.010178 IP ip-10-112-7-114.eu-west-1.compute.internal.40174 > prg03s06-in-f4.1e100.net.http:
18:28:57.010308 IP ip-10-112-7-114.eu-west-1.compute.internal.40174 > prg03s06-in-f4.1e100.net.http:
18:28:57.041100 IP prg03s06-in-f4.1e100.net.http > ip-10-112-7-114.eu-west-1.compute.internal.40174:
18:28:57.041110 IP ip-10-112-7-114.eu-west-1.compute.internal.40174 > prg03s06-in-f4.1e100.net.http:

From there you can see that curl opened the port 40174 and google (prg03s06-in-f4.1e100.net) replied from port 80 (http in the log).

Answer

Thanks to Tim and Michael-sqlbot. The answer is a bit buried in the comments. But the problem was a missunderstanding of how inbound rules work. The port range on the inbound rules refer to the destination port, not the source port. From the AWS docs

The following are the parts of a network ACL rule:

  • Rule number. Rules are evaluated starting with the lowest numbered rule. As soon as a rule matches traffic, it's applied regardless of any higher-numbered rule that may contradict it.
  • Protocol. You can specify any protocol that has a standard protocol number. For more information, see Protocol Numbers. If you specify ICMP as the protocol, you can specify any or all of the ICMP types and codes.
  • [Inbound rules only] The source of the traffic (CIDR range) and the destination (listening) port or port range.
  • [Outbound rules only] The destination for the traffic (CIDR range) and the destination port or port range.
  • Choice of ALLOW or DENY for the specified traffic.

Best Answer

When you make a connection on port 80 (or to any daemon on any port) the connection is handed off to high range port to keep port 80 free to accept new connections. These are called ephemeral ports.

You need to allow incoming traffic to these high range ports, which according to Wikipedia are 32768 to 61000. If you're providing a web server you probably need to allow them outgoing as well - which you have as rule 100.

Update / expanded NACLs are stateless, which means you need to allow ports in each direction data needs to flow. When you connect to a web server on port 80 their web server says "connection accepted, continue this exchange on port (say) 50000". This is why you need to allow high range ports incoming to allow outgoing traffic.

There's another explanation here.

Related Topic