How to automate OS/ECS-agent updating on a EC2 instance in ECS Auto Scaling environment

amazon ec2amazon-ecsamazon-web-servicesautomatic-updates

First off: I feel like I still don't understand some of the fundamental concepts of AWS, so please bear with me, if this question is noobish.

I have the following set-up in AWS:

  • 1 ECS-Cluster with 1 single service
  • The Cluster is configured to use 1 single EC2 instance
  • This EC2 instance is part of an AutoScaling group, that is based on a specific Launch Configuration. (The Cluster Setup configured it like this, which makes sense, I guess.)

I've developed a few preconceptions/conditions

  • I don't care about the EC2 instance, because my service runs machine-agnostic
  • My service only ever needs to be run on 1 instance at a time. I only use ECS to have a simple way to run a dockerized application.
  • I don't care about downtime at specific times.
  • There is a predefined Elastic IP that has to be used with the service.
  • I want this service to be as automated as possible. When something goes wrong, we can fix things (uptime is not as critical), but I never want to SSH to the EC2 instance or anything like that.

With help of CloudWatch and Lambda, I have set-up the following tasks:

The instance is identified by the cluster name, that is automatically added to the Name tag.

  • Once a week, the cluster instance reboots. This renews certificates and configuration, because the service does that on startup. (I probably could also have scheduled the service to be killed and restarted inside the cluster somehow…)
  • Everytime a new EC2 instance of the cluster starts, it will be assigned the predefined Elastic IP.
  • Once a month, the EC2 instance gets terminated to be automatically replaced by a new one, started by the Auto Scaling Group.

Now my hope was, that once a new instance gets created by the Auto Scaling Group, it will have the latest and greatest AMI including the latest ECS agent.

Correct me if I'm wrong, but when I looked at the Launch Configuration for this Auto Scaling Group, I figured this won't be the case, because it always takes the configured AMI.


My general question is: What use does this set-up have, when I need to manually check in every once in a while (when exactly?) to update the AMI in the Launch Configuration and then terminate the instance to have a new one replace it?

I understand that many people probably don't want to automate OS updates in a production cluster, because they want to test it first. But still, one might want to have a staging environment, where OS updates are applied automatically. Why do I use a highly automateable platform, when I still need to roll out OS updates manually. Is this a conceptual misunderstanding on my side?

Best Answer

I have created a Lambda function to update the instance agent in all my ECS clusters:

var AWS = require('aws-sdk');
AWS.config.update({ region: 'sa-east-1' });

exports.handler = async(event, context) => {
    var ecs = new AWS.ECS();

    var responseArray = [];

    const clusters = await ecs.listClusters({}).promise();

    for (var i = 0; i < clusters.clusterArns.length; i++) {
        const clusterArn = clusters.clusterArns[i];

        const clusterInstances = await ecs.listContainerInstances({
            cluster: clusterArn
        }).promise();

        for (var j = 0; j < clusterInstances.containerInstanceArns.length; j++) {
            const containerInstanceArn = clusterInstances.containerInstanceArns[j];

            try {
                const response = await ecs.updateContainerAgent({
                    containerInstance: containerInstanceArn,
                    cluster: clusterArn
                }).promise();

                responseArray.push({
                    cluster: clusterArn,
                    containerInstance: containerInstanceArn,
                    response: response
                });
            }
            catch (e) {
                responseArray.push({
                    cluster: clusterArn,
                    containerInstance: containerInstanceArn,
                    response: e
                });
            }
        }
    }

    return responseArray;
};

Then I created a CloudWatch event rule to execute lambda function daily. Works good for me.