AWS VPC – Internet Gateway vs NAT

amazon-nat-gatewayamazon-vpcamazon-web-servicesgatewaynat;

This and this and this are quite related to my question. Although it seems to have answered quite a lot of people's doubts, I am still struggling to understand if this setup is specific to AWS or in general networking. If its the latter, then I need to revisit my basics. I suspect that already and hence the question.

My understanding

If a private network is connected to the Internet, then its hosts need to have public IPs to be uniquely identifiable on the Internet. All the traffic to the internet, inbound or outbound, happens with this Public IP address. A host from this network when connected to the Internet gets a Public IP. A packet originating from this host to say www.google.com will have the host's private address in the packet which will ultimately get replaced by its public IP address by the NAT device (which is installed on the router/the default Gateway) which is as illustrated here. This is how most of the Internet (except IPV6) runs.

Now, in AWS

  • when you create a Public Subnet and enable auto-assign Public IPs, you are essentially informing the Internet Gateway to switch the private address of the EC2 instance with the public IP address of the EC2 instance in the request packets originating from this EC2 instance while routing its requests out on the internet and vice-versa on the way in. Is my understanding right?
  • when you create a Private Subnet (by not attaching it to the Internet Gateway), you are keeping it private. Then, we consciously make sure that we keep the auto-assign Public IP disabled. When we launch EC2 instances inside this private subnet, we, do not, therefore, get to see, the public IPs on the EC2 console. This also means that instances in this subnet are not visible to the internet. Now, if I connect this private subnet to a NAT device (which, of course, is on the public subnet) (please do not confuse me with what a NAT Gateway does better, at this moment), then, I am essentially, leaving the NAT device to figure out public IP to assign for a specific host X from the private subnet which has requested to communicate with the internet as a public IP is needed to communicate with the Internet.

    Now,

    • Is this not something that a Router/(Internet) Gateway already and also does in AWS and in general networking? Isn't the assignment of public IPs to hosts on a network and keeping replacing the private IP address with the public IP address in the packets (that originate from a host on this network) on their way out to the Internet is something that is carried out by a router?
    • Say the NAT device figures out the IP 1.2.3.4 to be assigned to this host of the private subnet. If, "somehow", this IP becomes known on the Internet, then this host on the private subnet should become reachable from the Internet, too, unless the NAT device pulls some trick (see follow up question). Is my understanding right? Now, AWS says that the NAT device does not allow inbound communication. Is that like a counter to the fact that even if the public IP 1.2.3.4 (that the NAT device assigns to the host of this private subnet) becomes known, the inbound connections are force restricted? Or, does the NAT device simply use its own IP address on behalf of the hosts from the private network (which is not what a NAT device should ideally do; a NAT device takes the assigned Private IP and replaces it with the Public IP on packets)?
    • Also, AWS allows you to enable auto-assign Public IPs on a private subnet, too. And I can confirm that I can see EC2 instances on private subnet with a Public IP. So, now you have a Private Subnet (as they are not connected to the Internet Gateway in the routing tables) with instances having a Public IP (as you enabled the auto-assign Public IPs on a private subnet). How is that supposed to be interpreted?

Best Answer

I think you misunderstand the function of NAT gateway and that leads to all the other confusions.

  • NAT doesn't randomly assign addresses to internal / private hosts.

  • NAT device usually has two interfaces - internal with a private IP, e.g. 10.0.0.1. And external with a public IP, e.g. 1.2.3.4.

  • Hosts from the internal network that have for example some 10.0.0.x address (and no public) send all the outbound traffic to the NAT gateway and that NAT gateway replaces the source IP in the packet (e.g. 10.0.0.123) with its own public IP (i.e. 1.2.3.4). Then it sends the packet on to the destination on the internet, e.g. to Google.

  • TCP packets have not only source and destination addresses but also source and destination ports. The source port may also be replaced by the NAT gateway to avoid collisions when multiple hosts try to communicate with the same source ports.

NAT Explained:

An internal host 10.0.0.123 wants to download something from google.com at 216.58.203.110 over HTTPS, i.e. port 443.

  1. It sends a packet with source 10.0.0.123:12345 (address : random local port) and destination 216.58.203.110:443 (address : https port) to the NAT gateway, because that's its the best next hop for all addresses that are not 10.0.0.x.

  2. The NAT gateway replaces the source from 10.0.0.123:12345 to its own public IP and some random port 1.2.3.4:54321.

  3. It records that a connection from 10.0.0.123:12345 to 216.58.203.110:443 has been translated to 1.2.3.4:54321 in its connection tracking table.

  4. When a return packet from google arrives at the NAT Gateway with a destination 1.2.3.4:54321, the gateway looks up that record (address:port) and sees that it should translate it back to 10.0.0.123:12345 and send it to that host on that port.

If at the same time another host from the local network (e.g. 10.0.0.99) attempts to download something from Google this is what happens:

  1. The NAT gateway translates the source IP again to its own public IP 1.2.3.4 but the source port will be something else than before, e.g. 56789.

  2. Now Google sees two connections from our NAT gateway

    • 1.2.3.4:54321 - to - 216.58.203.110:443 (where the NAT gateway knows that the original source is in fact 10.0.0.123:12345)
    • 1.2.3.4:56789 - to - 216.58.203.110:443 (the NAT GW knows that the original source is 10.0.0.99:12345).

Both connections appear to come from the NAT gateway but in fact they were initiated from different hosts in the internal network. Only the NAT gateway knows the mapping.

That's it in a nutshell.


Now a couple of notes:

  • You can't initiate connections from outside to a host behind NAT. You can only send replies to packets initiated from inside.

    That's because the mapping between internal IP:port and NAT gateway IP:port is done when the internal host sends the first packet out.

    If you wanted to SSH to 10.0.0.123:22 from outside how would you do it? You could send the SSH packet to the NAT gateway IP 1.2.3.4 but what port? See, there is no mapping so it's not possible to initiate a connection from outside.

  • Routers on the other hand do not change any IPs or ports (as opposed to NAT gateways). They pass the packets through pretty much unchanged.

  • In the AWS context Router is IGW = Internet Gateway. NAT can be either NAT Gateway or NAT Instance, they do the same thing.

Hope that explains it :)