How to automate OS/ECS-agent updating on a EC2 instance in ECS Auto Scaling environment

amazon ec2amazon-ecsamazon-web-servicesautomatic-updates

First off: I feel like I still don't understand some of the fundamental concepts of AWS, so please bear with me, if this question is noobish.

I have the following set-up in AWS:

1 ECS-Cluster with 1 single service
The Cluster is configured to use 1 single EC2 instance
This EC2 instance is part of an AutoScaling group, that is based on a specific Launch Configuration. (The Cluster Setup configured it like this, which makes sense, I guess.)

I've developed a few preconceptions/conditions

I don't care about the EC2 instance, because my service runs machine-agnostic
My service only ever needs to be run on 1 instance at a time. I only use ECS to have a simple way to run a dockerized application.
I don't care about downtime at specific times.
There is a predefined Elastic IP that has to be used with the service.
I want this service to be as automated as possible. When something goes wrong, we can fix things (uptime is not as critical), but I never want to SSH to the EC2 instance or anything like that.

With help of CloudWatch and Lambda, I have set-up the following tasks:

The instance is identified by the cluster name, that is automatically added to the Name tag.

Once a week, the cluster instance reboots. This renews certificates and configuration, because the service does that on startup. (I probably could also have scheduled the service to be killed and restarted inside the cluster somehow…)
Everytime a new EC2 instance of the cluster starts, it will be assigned the predefined Elastic IP.
Once a month, the EC2 instance gets terminated to be automatically replaced by a new one, started by the Auto Scaling Group.

Now my hope was, that once a new instance gets created by the Auto Scaling Group, it will have the latest and greatest AMI including the latest ECS agent.

Correct me if I'm wrong, but when I looked at the Launch Configuration for this Auto Scaling Group, I figured this won't be the case, because it always takes the configured AMI.

My general question is: What use does this set-up have, when I need to manually check in every once in a while (when exactly?) to update the AMI in the Launch Configuration and then terminate the instance to have a new one replace it?

I understand that many people probably don't want to automate OS updates in a production cluster, because they want to test it first. But still, one might want to have a staging environment, where OS updates are applied automatically. Why do I use a highly automateable platform, when I still need to roll out OS updates manually. Is this a conceptual misunderstanding on my side?

Best Answer

I have created a Lambda function to update the instance agent in all my ECS clusters:

var AWS = require('aws-sdk');
AWS.config.update({ region: 'sa-east-1' });

exports.handler = async(event, context) => {
    var ecs = new AWS.ECS();

    var responseArray = [];

    const clusters = await ecs.listClusters({}).promise();

    for (var i = 0; i < clusters.clusterArns.length; i++) {
        const clusterArn = clusters.clusterArns[i];

        const clusterInstances = await ecs.listContainerInstances({
            cluster: clusterArn
        }).promise();

        for (var j = 0; j < clusterInstances.containerInstanceArns.length; j++) {
            const containerInstanceArn = clusterInstances.containerInstanceArns[j];

            try {
                const response = await ecs.updateContainerAgent({
                    containerInstance: containerInstanceArn,
                    cluster: clusterArn
                }).promise();

                responseArray.push({
                    cluster: clusterArn,
                    containerInstance: containerInstanceArn,
                    response: response
                });
            }
            catch (e) {
                responseArray.push({
                    cluster: clusterArn,
                    containerInstance: containerInstanceArn,
                    response: e
                });
            }
        }
    }

    return responseArray;
};

Then I created a CloudWatch event rule to execute lambda function daily. Works good for me.

Related Solutions

AWS EC2 Instance Auto Scaling Confusion

Let's go over your questions.

So basically I want my original instance to be running at all times. Then when it starts going over capacity I want the Auto Scaling Group to start launching instances and the Load Balancer to distribute the load across them. Is my thinking here sound?

I'd say yes, but I do have a couple reservations. If I understand correctly, you've placed your "main" instance outside of the auto scaling group. Theoretically, that would ensure that auto scaling doesn't kill off your original instance. There are a couple of things I'd like to mention:

You're not making full use of the possibilities of Auto Scaling. Auto Scaling not only enables your setup to scale, but it can also ensure limits. If, for whatever reason, your "main" instance dies, your auto scaling policy won't come into action. If you keep your instance in an auto scaling group with a min-size of 1, Auto Scaling automatically replaces the failed instance.
When auto scaling, it's often best practise to treat your instances as being "disposable", because that's how you build resilient systems. Don't depend on one instance to always be available.
You can set the termination policy for your auto scaling group so that it always terminates the newest instances first. That would ensure your "main" instance will be kept (as long as it's healthy). My previous comment still applies though.

When I make code and data changes to my original instance, do I have to remake the image my Launch Configuration uses?

I'd say no, but that's more of a design issue. Your image should describe the state of your server, but it shouldn't be responsible for code distribution. Consider a situation where you'd have to update your application because of an urgent bug, but your servers are under high load. Does updating your main server, creating an AMI, updating your launch config and killing off your auto scaled servers so they can be respawned with the latest AMI sound like an attractive solution? My answer to that would be no (again). Look into source code version control and deployment strategies (I'm a Rails developer 60% of the time and use git and capistrano, for instance).

There are situations where your approach would work as well and there is a lot of middle ground here (I would recommend also looking into Chef and userdata scripts). I myself actually rarely use custom AMIs, thanks to Chef.

What needs to be down with DNS names and IPs? I'm currently using Route 53, do I make that point to my Load Balancer and that's it?

Basically, yes. You can select the loadbalancer(s) that should be attached to new instances when creating your auto scaling group. I haven't used the GUI for Auto Scaling yet, but I'm quite sure it's in there somewhere. If not, the CLI still supports it. Point your Route53 record to your ELB alias and that's basically it.

Response to additional questions (2014/02/23):

If you're developing using Rails, I can highly recommend Capistrano for deployments, which can take a specific version of your app in your preferred version control system (like git) and deploy it to a number of servers in a specific environment. There are a bunch of tutorials out there, but Ryan Bates' revised (and pro) Railscasts on the subject are very concise, although you need a subscription to his website to watch both of them. Most of the other tutorials will get you going as well though. Fire up a new server with your AMI and a launch script that pulls a temporary clone of your git repo and runs a local Capistrano command to get your app going. This ensures that, later on, you can also deploy new versions of your application using Capistrano with just one command to all running servers.

Capistrano also provides a couple of other benefits, including enabling you to execute specific tasks on all or just one of your servers and roll back a deployment, which is very hard to accomplish using just git.

How to automatically update AMI in Amazon EC2 Auto Scaling Launch Configuration

Try something like this (assuming you're using Linux):

#Define parameters
INSTANCE=i-abcd1234
ASG_NAME="current-autoscaling-group-name"
OLD_LC="old-launch-configuration-name"
NEW_LC="new-launch-configuration-name"

# Create AMI
IMAGE=`aws ec2 create-image --instance-id $INSTANCE --name NEW-IMAGE --output text`

# Create Launch Configuration
aws autoscaling create-launch-configuration --launch-configuration-name $NEW_LC --image-id $IMAGE --instance-type t2.micro

# Update Auto Scaling Group to use new Launch Configuration
aws autoscaling update-auto-scaling-group --auto-scaling-group-name $ASG_NAME --launch-configuration-name $NEW_LC

# Delete old Auto Scaling Launch Configuration
aws autoscaling delete-launch-configuration --launch-configuration-name $OLD_LC

Best Answer

Related Solutions

AWS EC2 Instance Auto Scaling Confusion

How to automatically update AMI in Amazon EC2 Auto Scaling Launch Configuration

Related Topic