Terraform: How to prevent ASG ec2 instance coming up before NAT Gateway is created

terraform

I'm using 2 modules. One is a custom VPC module and the other is a module to bring up a jenkins ec2 instance.

You can't use depends_on with modules but the Jenkins module does rely on certain outputs from the VPC module like so…

  jenkins_elb_subnets_ids                 = ["${module.vpc.public_subnets_ids[0]}", "${module.vpc.public_subnets_ids[1]}"]
  jenkins_instance_subnets_ids            = ["${module.vpc.private_subnets_ids[0]}", "${module.vpc.private_subnets_ids[1]}"]
  vpc_id                                  = "${module.vpc.vpc_id}"
  vpc_cidr                                = "${var.vpc_cidr}"

This still doesn't prevent the jenkins Ec2 instance from launching before the NAT Gateways are even created.

�[0m�[1mmodule.jenkins.aws_launch_configuration.jenkins_lc: Creation complete after 5s (ID: devops-jenkins-lc-20180309131935169800000002)�[0m�[0m
�[0m�[1mmodule.jenkins.aws_autoscaling_group.jenkins_asg: Creating...�[0m
  arn:                            "" => "<computed>"
  default_cooldown:               "" => "<computed>"
  desired_capacity:               "" => "1"
  force_delete:                   "" => "false"
  health_check_grace_period:      "" => "300"
  health_check_type:              "" => "EC2"
  launch_configuration:           "" => "devops-jenkins-lc-20180309131935169800000002"
  load_balancers.#:               "" => "1"
  load_balancers.2235174564:      "" => "devops-jenkins-elb"
  max_size:                       "" => "1"
  metrics_granularity:            "" => "1Minute"
  min_size:                       "" => "1"
  name:                           "" => "devops-jenkins-lc-20180309131935169800000002"
  protect_from_scale_in:          "" => "false"
  tags.#:                         "" => "4"
  tags.0.%:                       "" => "3"
  tags.0.key:                     "" => "Name"
  tags.0.propagate_at_launch:     "" => "1"
  tags.0.value:                   "" => "devops-jenkins"
  tags.1.%:                       "" => "3"
  tags.1.key:                     "" => "BackupDisable"
  tags.1.propagate_at_launch:     "" => "1"
  tags.1.value:                   "" => "No"
  tags.2.%:                       "" => "3"
  tags.2.key:                     "" => "Environment"
  tags.2.propagate_at_launch:     "" => "1"
  tags.2.value:                   "" => "dev"
  tags.3.%:                       "" => "3"
  tags.3.key:                     "" => "AppComponent"
  tags.3.propagate_at_launch:     "" => "1"
  tags.3.value:                   "" => "Jenkins-master"
  target_group_arns.#:            "" => "<computed>"
  vpc_zone_identifier.#:          "" => "2"
  vpc_zone_identifier.3355635847: "" => "subnet-4f13e705"
  vpc_zone_identifier.3554579391: "" => "subnet-8e92b2d3"
  wait_for_capacity_timeout:      "" => "0"�[0m
�[0m�[1mmodule.jenkins.aws_autoscaling_group.jenkins_asg: Creation complete after 1s (ID: devops-jenkins-lc-20180309131935169800000002)�[0m�[0m
�[0m�[1mmodule.vpc.aws_vpn_gateway.transit_vgw: Still creating... (10s elapsed)�[0m�[0m
�[0m�[1mmodule.vpc.aws_route53_zone.main: Still creating... (10s elapsed)�[0m�[0m
�[0m�[1mmodule.vpc.aws_nat_gateway.private_nat_gw.1: Still creating... (10s elapsed)�[0m�[0m
�[0m�[1mmodule.vpc.aws_nat_gateway.private_nat_gw.0: Still creating... (10s elapsed)�[0m�[0m

Which results in the Jenkins failing to come up properly.

Cannot find a valid baseurl for repo: amzn-main/latest
Could not retrieve mirrorlist http://repo.us-east-1.amazonaws.com/latest/main/mirror.list error was
12: Timeout on http://repo.us-east-1.amazonaws.com/latest/main/mirror.list: (28, 'Connection timed out after 5001 milliseconds')
Mar 09 13:19:55 cloud-init[2581]: util.py[WARNING]: Failed to install packages: ['git', 'aws-cfn-bootstrap', 'docker', 'jq-libs', 'jq', 'perl-Test-Simple.noarch', 'perl-YAML.noarch', 'gcc', 'amazon-ssm-agent.rpm', 'perl-Switch', 'perl-DateTime', 'perl-Sys-Syslog', 'perl-LWP-Protocol-https', 'perl-Test-Simple.noarch', 'perl-YAML.noarch']

Now Terraform does have an "Offical" VPC module from the AWS team. I have looked at it's code and it doesn't appear to do anything to mitigate this? But with 90k deployments and only 36 issues… doesn't appear like it's a problem they have. I haven't tested it my self because using it isn't an option, but it might mean the issue is with my modules.

Edit: That didn't work @sysadmin1138 I tried this…

resource "aws_autoscaling_group" "jenkins_asg" {
  depends_on                = ["module.vpc.aws_nat_gateway.private_nat_gw.1", "module.vpc.aws_nat_gateway.private_nat_gw.0"]

and got this error

Initializing the backend...

Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.

Error: aws_autoscaling_group.jenkins_asg: resource depends on non-existent module 'vpc.aws_nat_gateway.private_nat_gw.1'



Error: aws_autoscaling_group.jenkins_asg: resource depends on non-existent module 'vpc.aws_nat_gateway.private_nat_gw.0'

Edit2:

I have tried adding a public_ip output from the NAT Gatway in the VPC module as an input to the Jenkins module. Hoping it would hold up the Jenkins module until after the NAT Gateway is up and ready. This did not work. What I have observed with Terraform and modules is that unless you use the variable somewhere, like in the Jenkins userdata then the variable is ignored completely. It not enough to have it as an input to the module it has to be an input to a resource in that module. The side effect is that as a computed value it will try to recreate your resource every time.

Best Answer

Thanks to jbardin https://github.com/hashicorp/terraform/issues/14056

In order to fix this you need to use an output from your VPC module. You can use the aws_nat_gateway attribute public_ip but since I had a route created after the aws_nat_gateway I used that instead. I then made a dummy variable and dummy resource in my Jenkins module.

resource "null_resource" "dummy" {
  provisioner "local-exec" {
    command = "echo ${var.dummy}"
  }
}

Make sure you assign that dummy variable to the output you have chosen. Also it needs to be a string. dummy = "${join(",", module.vpc.private_nat_gw_routes)}"

After that I used depends_on = ["null_resource.dummy"] on my ASG resource. This made that resource wait until after the NAT Gateway + routes were created but doesn't have the nasty side effect of recreating the resource every time.

No changes. Infrastructure is up-to-date.

This means that Terraform did not detect any differences between your
configuration and real physical resources that exist. As a result, no
actions need to be performed