How to avoid duplicate scale-in with step scaling policy

amazon ec2autoscaling

With EC2 auto scaling, when using a step scaling policy (as opposed to a simple scaling policy) that scales in due to an alarm that is based on an AWS/EC2 metric (e.g. CPUUtilization <= 30%) with detailed CloudWatch monitoring disabled, when my auto scaling group scales in, it scales in two instances in short succession, without waiting for the metric to update. How can I prevent the auto scaling group from scaling in too fast for the metric to update?

Edit: here was last night’s scaling history. At 5:15, 5:17, 5:19, 5:21 UTC, auto scaling scaled in due to low CPU Utilization, even though CPU Utilization only has data points at 5:10, 5:15, 5:20, and the scaling event should have ceased after the 5:15 datapoint. There appears to be no way to adjust the cooldown duration of step scaling policy scale ins (step scaling policies ignore the Default Cooldown (=600s) and only scale out policies have estimated instance warmup).

healthy host count showing rapid scale ins and gradual scale outs CPUUUtilization with

Best Answer

I had exactly the same problem. What i ended up doing was changing my scale down policy to SimpleScaling reducing by one instance at a time and setting a 10m cooldown. I've also changed my scale down alarm condition to trigger when there are 10 periods of 60 seconds below my threshold of 35% CPU. ( I have detailed cloudwatch metrics enabled ) The idea being that as soon as a scale down occurs then the alarm will turn off faster than if there was fewer periods and a longer evaluation time.

I've still got StepScaling to go up so I scale up fast, but with SimpleScaling and a cooldown to scale in I scale down much more slowly.

Related Solutions

AWS – How to Set Auto Scaling Policy Based on Memory Usage

Auto scaling on EC2 is based on triggers from Cloudwatch. By default, Cloudwatch does not collect data about memory usage (the official reason being something to the effect of such metrics requiring 'a look into the OS running in the instance')

The solution, therefore, is to setup a custom metric to monitor memory usage attach an alarm to that metric, and then base your scaling policy off that alarm.

Amazon has described the procedure fairly well in this forum post.

Firstly, you have a script that will gather the data from 'free' (copied from the above page):

#!/bin/bash

export AWS_CLOUDWATCH_HOME=/home/ec2-user/CloudWatch-1.0.12.1
export AWS_CREDENTIAL_FILE=$AWS_CLOUDWATCH_HOME/credentials
export AWS_CLOUDWATCH_URL=https://monitoring.amazonaws.com
export PATH=$AWS_CLOUDWATCH_HOME/bin:$PATH
export JAVA_HOME=/usr/lib/jvm/jre

# get ec2 instance id
instanceid=`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`

memtotal=`free -m | grep 'Mem' | tr -s ' ' | cut -d ' ' -f 2`
memfree=`free -m | grep 'buffers/cache' | tr -s ' ' | cut -d ' ' -f 4`
let "memused=100-memfree*100/memtotal"

mon-put-data --metric-name "FreeMemoryMBytes" --namespace "System/Linux" --dimensions "InstanceId=$instanceid" --value "$memfree" --unit "Megabytes"

mon-put-data --metric-name "UsedMemoryPercent" --namespace "System/Linux" --dimensions "InstanceId=$instanceid" --value "$memused" --unit "Percent"

The script takes the number from the '-/+ buffers/cache' row under the 'free' column, as a percent of 'total' (under the 'Mem' row), and sets up 2 metrics - the percent of memory used, and the total memory free in MB.

All of the AWS API tools are very slow (relatively speaking) - if possible, use the API directly from some supported language (e.g. Ruby) and you will get much better performance than the script above.

Modify the above script to suit your needs (you probably don't need both metrics, etc) and set it up to run every few minutes via cron. Keep in mind that you get a limited number of custom/detailed metrics and alarms for free, after which their is a monthly cost.

There is also a Google Code project - 'Aws Missing Tools' that has scripts for monitoring memory usage and a few other metrics that may be helpful.

Once you have your metric setup and functioning, create an alarm for it and proceed with autoscaling (as-put-scaling-policy, etc) as you would for any of the pre-defined metrics.

How to auto-terminate EC2 instances after 24 hours

1) Why would an instance start to gradually impair after 24 hours?

There is no issue with EC2 instances that run >24hrs. Your application is probably buggy, and slowing down over time. Perhaps there is a memory leak leading to increased swapping?

2) How can I auto-terminate instances that run longer than 24 hours?

There are many ways. The simplest is probably to bundle a shell script with your application deployment that kills the instance after 24 hours. You could do that with a command like bash -c 'bash -c "sleep 3 && echo hi" &'. You can run that on application deployment by adding it to the command section of an .ebextension file in your application version.

Best Answer

Related Solutions

AWS – How to Set Auto Scaling Policy Based on Memory Usage

How to auto-terminate EC2 instances after 24 hours

Related Topic