Connect to RDS from EC2 instance in a different Availability Zone (AZ)

amazon ec2amazon-rdsamazon-vpcamazon-web-services

OK, so I have a VPC with three app servers and an instance of Postgres in RDS.

I have a security group called 'rds-staging' that allows inbound connections on port 5432 from a security group called 'app-elb-staging'.

'app-elb-staging' is the security group applied to all of my EC2 instances, and it allows outgoing traffic to go anywhere.

The RDS instance is in AZ us-east-1e. I can connect to it from my EC2 instance in us-east-1e (10.0.3.*), but not from any EC2 instances in us-east-1a (10.0.1.*) or us-east-1c (10.0.2.*):

deploy@ip-10-0-3-220:~$ nc -zv xxx.us-east-1.rds.amazonaws.com 5432
Connection to xxx.us-east-1.rds.amazonaws.com 5432 port [tcp/postgresql] succeeded!

deploy@ip-10-0-1-155:~$ nc -zv xxx.us-east-1.rds.amazonaws.com 5432
nc: connect to xxx.us-east-1.rds.amazonaws.com port 5432 (tcp) failed: No route to host

deploy@ip-10-0-2-90:~$ nc -zv xxx.us-east-1.rds.amazonaws.com 5432
nc: connect to xxx.us-east-1.rds.amazonaws.com port 5432 (tcp) failed: No route to host

Has anyone seen this before? I've checked the DNS, and each machine is resolving the hostname to the same IP (10.0.3.x).

Best Answer

OK, finally figured the root cause of this issue. The AMI I was using was creating a bridge that caused the connection issue due to it colliding with the IPs of my subnets. The output from sudo route -n looked like this on an affected instance:

ubuntu@ip-10-0-1-92:~$ sudo route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.0.1.1        0.0.0.0         UG    0      0        0 eth0
10.0.1.0        0.0.0.0         255.255.255.0   U     0      0        0 eth0
10.0.2.0        0.0.0.0         255.255.255.0   U     0      0        0 lxcbr0

Any connection to 10.0.2.* would then fail:

deploy@ip-10-0-1-92:~$ nc -zv 10.0.2.53 22
nc: connect to 10.0.2.53 port 22 (tcp) failed: No route to host

Removing the bridge with sudo ifconfig lxcbr0 down resolved the issue, but using an AMI that does not set this bridge up in the first place corrected the root.