Network errors when running `apt-get update` in cloud-init script

amazon-web-servicesaptautoscalingcloud-initubuntu-16.04

Yesterday I set up my first Autoscaling Group in AWS. I wrote a cloud-init/userdata script to install my application and I tested it ~40 times without any errors. Just before I went home it suddenly stopped working, new instances that start never become healthy and are eventually terminated once their grace period expires.

This morning I come in and find that the issue is persisting. I SSH'd into an instance and took a look at the cloud-init-output.log file and found the following:

Err:1 http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu xenial InRelease
  Could not connect to ap-southeast-2.ec2.archive.ubuntu.com:80 (54.253.131.141), connection timed out [IP: 54.253.131.141 80]
Err:2 http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu xenial-updates InRelease
  Unable to connect to ap-southeast-2.ec2.archive.ubuntu.com:http: [IP: 54.253.131.141 80]
Err:3 http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu xenial-backports InRelease
  Unable to connect to ap-southeast-2.ec2.archive.ubuntu.com:http: [IP: 54.253.131.141 80]
Err:4 http://security.ubuntu.com/ubuntu xenial-security InRelease
  Cannot initiate the connection to security.ubuntu.com:80 (2001:67c:1360:8001::21). - connect (101: Network is unreachable) [IP: 2001:67c:1360:8001::21 80]
Reading package lists...
W: Failed to fetch http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu/dists/xenial/InRelease  Could not connect to ap-southeast-2.ec2.archive.ubuntu.com:80 (54.253.131.141), connection timed out [IP: 54.253.131.141 80]
W: Failed to fetch http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu/dists/xenial-updates/InRelease  Unable to connect to ap-southeast-2.ec2.archive.ubuntu.com:http: [IP: 54.253.131.141 80]
W: Failed to fetch http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu/dists/xenial-backports/InRelease  Unable to connect to ap-southeast-2.ec2.archive.ubuntu.com:http: [IP: 54.253.131.141 80]
W: Failed to fetch http://security.ubuntu.com/ubuntu/dists/xenial-security/InRelease  Cannot initiate the connection to security.ubuntu.com:80 (2001:67c:1360:8001::21). - connect (101: Network is unreachable) [IP: 2001:67c:1360:8001::21 80]
W: Some index files failed to download. They have been ignored, or old ones used instead.

This is caused by the sudo apt-get update command at the top of my script. Following this, multiple packages in my sudo apt-get -y install command fails to install, which then prevents my application from working.

The weird thing is, if I run sudo apt-get update via SSH it works without any errors, it's only in the cloud-init script that it doesn't work. My hunch is that maybe the instance hasn't yet connected to the network at the time that the script executes? If this is the case, how can I work around this issue?

EDIT: I can no longer reproduce this issue. I've added this to the top of my script to attempt to prevent the issue from re-occurring:

until ping -c1 ap-southeast-2.ec2.archive.ubuntu.com &>/dev/null; do echo "waiting for networking to initialise"; done

But the "waiting for networking to initialise" message isn't present in cloud-init-output.log, so it seems this code isn't doing anything and the issue may have been temporary. If anyone knows what causes this issue and what a more reliable way of mitigating it is, please let me know.

Best Answer

I figured out what the issue was and I feel a bit silly. It turns out that an instance needs a public IP in order to access servers outside the VPC. I guess I assumed that there would be some kind of NAT allowing the servers to dial out without a public IP, but I see now that if I want that I have to set it up myself with a NAT Gateway.

The reason this issue was hard to troubleshoot is that in order to SSH in and view the logs I was assigning an Elastic IP to the instance, which then caused the script to succeed.

Related Topic