Unable to ping or ssh between aws vpc subnets

amazon-vpcamazon-web-servicessubnet

I have a fairly standard multi-tier subnet layout in VPC. There is a database tier/subnet, a web server tier/subnet and a bastion host tier/subnet. My problem is that I cannot ping or ssh between subnets.

In particular I would like to ping and ssh from the bastion tier/subnet into the web server tier/subnet.

172.31.32.0/20 bastion-tier
172.31.0.0/20 webserver-tier

Both subnets are in the same availability zone and both subnets are attached to the same route table. The route table looks like this:

172.31.0.0/16 local
0.0.0.0/0 igw-xxxxxxxx

At present the network ACLs for the webserver-tier permit ALL traffic, ALL protocols, ALL port ranges from 172.31.32.0/20, which is the bastion tier. The outbound/egress rules permit all traffic. The security groups are likewise wide open. Here are the Network ACLs for the webserver-tier.

RULE #  TYPE          PROTOCOL  PORT RANGE  SOURCE          ALLOW/DENY
100     ALL Traffic   ALL       ALL         172.31.32.0/20  ALLOW
200     HTTP (80)     TCP (6)   80          0.0.0.0/0       ALLOW
202     HTTP* (8080)  TCP (6)   8080        0.0.0.0/0       ALLOW
210     HTTPS (443)   TCP (6)   443         0.0.0.0/0       ALLOW
*       ALL Traffic   ALL       ALL         0.0.0.0/0       DENY

I have tried ping and ssh across subnets with both subnets being attached to the default/main route table AND I have tried with the web server subnet being attached to its own route table. When I open any of these subnets to traffic from my laptop's ip address, I am able to successfully ssh in via the public ip addresses of the instances.

I have seen information online that implies odd/buggy behavior within AWS VPCs. Problems for instance when creating Elastic IPs via the VPC console, but assigning them via the EC2 console and then having traffic disappear as if into a black hole. The solution seemed to be to delete the buggy EIP and recreate and assign a fresh one fully via either the VPC or EC2 console. However this is at best an indirect view toward possible/general AWS bugginess, since in my case there are no EIPs involved.

My next troubleshooting measure is to start over with a new VPC, create two subnets, spin up a server instance in each, and then test ping and ssh between them. Single route table, and wide open network acls and security groups — can't get simpler than that.

This seems to me to be a basic setup, and so I suspect that there is a basic solution that I am missing. Any thoughts? Please and thank you!

Best Answer

You need a route and open network ACLs from the webserver-tier subnet back to the bastion-tier subnet or your response packets will never make it back to the server. Enable ICMP (and make sure your ping client is using ICMP only -- some use UDP packets by default) from webserver subnet to bastion and open the appropriate ephemeral TCP ports (range is generally OS-dependent; see http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_ACLs.html#VPC_ACLs_Ephemeral_Ports) from webserver-tier to bastion-tier.

If you run tcpdump on a webserver instance and a bastion instance simultaenously, you would likely see the webserver is getting the bastion packets and sending a response, but the bastion instance is never getting the response.