Wait for AutoScaling Groups with Terraform

When using Terraform, it’s not possible to know when the EC2 instances that are part of an AutoScaling Group are completely ready without using an ELB. And by completely ready, I mean we know the instances have finished running their cloud-init process (AKA userdata). See this Github issue for more details. But if we have access to the instances in the VPC, there’s a workaround!

To do this, you’ll create your AutoScaling Group as usual:

resource "aws_autoscaling_group" "cluster" {
  desired_capacity     = "${var.ec2-instance-count}"
  launch_configuration = "${aws_launch_configuration.cluster.id}"
  max_size             = "${var.ec2-instance-count}"
  min_size             = 1
  name                 = "cluster"
  vpc_zone_identifier  = var.subnet-ids

  tag {
    key                 = "Name"
    value               = "cluster"
    propagate_at_launch = true
  }
}

Then, we’ll create a data source that’s meant to wait for the AutoScaling group to be created:

data "aws_instances" "cluster" {
  filter {
    name = "tag:aws:autoscaling:groupName"
    values = ["${aws_autoscaling_group.openshift-masters.name}"]
  }
}

This data source will wait for the AutoScaling Group to be ready before populating the instance information. Now for the trick…

Create a Terraform module and feed the data source as an input to that module. The variable should accept a list as input:

variable "cluster-ips" {
  type = list(string)
}

The calling the module should look something like this:

module "cluster-wait-for" {
  source = "./cluster-wait-for"
  cluster-ips = "${data.aws_instances.cluster}"
  ec2-instance-count = "${var.ec2-instance-count}"
}

And now, inside that module, we create a null_resource to ssh to the nodes and wait for cloud-init to finish:

resource "null_resource" "wait-for-cloud-init-cluster" {
  count = "${var.ec2-instance-count}"
  connection {
    type = "ssh"
    user = "ec2-user"
    host = "${var.cluster-ips[count.index]}"
    private_key = "${file(var.ec2-key-location)}"
  }

  provisioner "remote-exec" {
    inline = [
      "sudo cloud-init status --wait"
    ]
  }


  triggers = {
    workers = "${join(",", sort(var.cluster-ips))}"
  }
}

Now, whenever the AutoScaling Group is updated (thanks to the trigger) this resource will ssh to the instances and wait until cloud-init is finished running.