Set Up AWS Network ACL for SSH to Private Subnet

access-control-listamazon-vpcamazon-web-servicesssh

I'm doing a course on AWS. What I'm trying to do is set up a VPC with two Linux servers. I've set up the VPC with two subnets. I have put one server in each. The idea is that one subnet is public the other is private.

I've created two Network ACLs and associated one with each subnet.

I can SSH from my machine to the server in the public subnet. When I try to SSH from that to the server in my private subnet I'm getting a connection timeout.

I'm not sure what rules I need to set in my two Network ACLs to get SSH working. Can anyone help? Given that I'm learning, I'd appreciate an explanation of why the rules should be, not just what the rules should be.

I have a VPC called MyVPC with CIDR 10.0.0.0/16
My first Subnet is called MyVPCSub1 CIDR 10.0.1.0/24
My second Subnet is called MyVPCSub2 CIDR 10.0.2.0/24
I have a route table called MyInternetRoute associated with MyVPCSub1 routes are
:

|Dest        |Targ  |  
|10.0.0.0/16 |local |
|0.0.0.0/0   |igw   |

I have a route table called MyPrivate associated with MyVPCSub2 Routes are:

|Dest        |Targ  |
|10.0.0.0/16 |local |

I have a Network ACL called MyWeb associated with MyVPCSub1 with rules:

Inbound:

| #   | Type | Protocol | Ports | Source     | A/D
| 99  | HTTP | TCP      | 80    | {My IP}/32 | D
| 100 | HTTP | TCP      | 80    | 0.0.0.0/0  | A
| 200 | HTTPS| TCP      | 443   | 0.0.0.0/0  | A
| 300 | SSH  | TCP      | 22    | {My IP}/32 | A
| *   | ALL  | ALL      | ALL   | 0.0.0.0/0  | D

Outbound:

| #   | Type   | Protocol | Ports      | Source     | A/D
| 50  | ALL    | ALL      | ALL        | 0.0.0.0/0  | A
| 100 | HTTP   | TCP      | 80         | 0.0.0.0/0  | A
| 200 | HTTPS  | TCP      | 443        | 0.0.0.0/0  | A
| 300 | Custom | TCP      | 1024-65535 | 0.0.0.0/32 | A
| *   | ALL    | ALL      | ALL        | 0.0.0.0/0  | D

I have a Network ACL called MyPrivate associated with MyVPCSub2 with rules:

Inbound:

| #   | Type | Protocol | Ports | Source    | A/D
| 100 | ALL  | ALL      | ALL   | 0.0.0.0/0 | A
| *   | ALL  | ALL      | ALL   | 0.0.0.0/0 | D

Outbound:

| #   | Type | Protocol | Ports | Source    | A/D
| 100 | ALL  | ALL      | ALL   | 0.0.0.0/0 | A
| *   | ALL  | ALL      | ALL   | 0.0.0.0/0 | D

Best Answer

The first thing is to define what some of the terms mean.

NACLS - Network Access Control Lists, are a state-less packet filter applied at the subnet level. The 'state-less' aspect is important to keep in mind, this means you need to be explicit for all traffic entering and leaving the subnet. For example with a 'state-full' rule approach (which is what the Security Group in AWS applies), you can simply specify the inbound traffic of TCP/22 for SSH and it will automatically allow the outbound traffic. With NACLS this is not the case, you will need to specify a rule in each direction to allow the traffic to pass.

Security Groups - these are groups of state-full rules that can be applied to one or more instances in a VPC. Note they apply at the instance level. The Security Group can be compared to a traditional state-full firewall, but because it applied at the individual instance level, you can segregate instances from each other even within the same subnet which is nice. And because they are state-full, if you want to allow traffic into a server (for example TCP/22 for SSH), you don't have to worry about creating a corresponding outbound rule, the platform takes care of that automatically, so they are much easier to manage - which also means less chance of errors.

There is a nice table which compares these two: VPC Security Comparison

There is also a nice diagram on that page which shows the order of things being applied for traffic depending on the direction of flow ...so check that out.

Then in terms of subnets we have:

Public subnet - in AWS terms, this is simply a subnet which has a route table attached that has a 0.0.0.0/0 route via an attached Internet Gateway

Private subnet - this is the opposite, i.e. it doesn't have a 0.0.0.0/0 route via an attached Internet Gateway. Note that it can still have a 0.0.0.0/0 route via a NAT Gateway or similar proxy in your environment, just not direct.

The question is, when you have NACLS and Security Groups - which do you use. AWS describe NACLs as an "optional layer of security for your VPC". And it is true that in general Security Groups are sufficient, they are more flexible and provide the same protection. In my experience there are some typical cases where I see NACLS used however:

  1. Black holes for known bad actors - if you've been attacked from a particular IP range, its an easy approach to just add a NACL that blocks the IP/subnet source completely.
  2. As a way of delegating control to teams - the security team apply broad brush NACL configurations (for example only allowing traffic from trusted corporate networks) and then allow ops/dev teams to configure their own Security Group rules. That way even if the engineer tries to open up the security group on their instance to the internet for testing, the NACL will block it. You can use IAM to restrict who can modify NACLs, but grant access to teams to control everything else, it acts as a great backstop to miss-configuration errors in larger environments.

AWS also provide some guidance on a number of configuration scenarios available here: Recommended Network ACL Rules for Your VPC

My guidance though is typically Security Groups provide suitable protection, are easier to understand and configure and are more flexible and granular in their application. NACLs do provide you that extra backstop for human error or more advanced configurations, but for basic use they are not typically used. Hence I assume why AWS refer to them as "optional".

I would leave NACLs in their default configuration (allow all traffic in and out) and instead focus on Security Groups for now, as using NACLs as a second layer will only add an extra layer of complexity which is perhaps not needed in your scenario. From a learning perspective, it is good to know they are there, they are state-less, they apply at the subnet level and they apply after the routing decision and before security groups on traffic entering a subnet.

In regards to your specific situation, because you are using NACLs you need to remember then that they are state-less. Therefore all traffic flows in and out of the subnet need to be accounted for - the main reason why Security Groups are so much easier. So in your case you have:

  • Traffic into your public server on TCP/22 from your IP - yup - rule #300 inbound
  • Return traffic from your public server on a high port for the return SSH traffic - yup - rule #50 outbound (but not rule #300 - see below)
  • Traffic from your public subnet outbound to your private subnet on TCP/22 - yup - rule #50 outbound
  • Traffic inbound on the private subnet from the public one - yup - rule #100 inbound
  • Return traffic from your private server to the public server on the private subnet - yup - rule #100 outbound
  • Return traffic from your private server to the public server on the public subnet - ah - no, there is no rule to allow high port (the ephemeral port the ssh client is using on the public server to initiate the connection to the private one) back in from the private server.

You need to add a rule like rule #300 (but note you have formatted the source IP slightly wrong - see below) on your outbound public subnet ACL, but in bound, with a source of the private subnet. Then assuming your Security Groups are well configured then you should be good to go.

Hope that helps.

To add - as per the other answer - rule #300 on the outbound rule set of the public subnet is miss-formatted. It should be 0.0.0.0/0 and not 0.0.0.0/32, however in your case you weren't hitting that as rule #50 is hit first and is allowing all traffic anyway - so while it wouldn't work, it wasn't actually causing your problem.