I created an alarm to stop an instance and email me if it was idle for too long (avg. CPU Utilization < 2% for 3 hours). However in my testing I noticed that the instance was stopped after 1 hour. Attached is the report from the email:
Alarm Details:
Name: Stop
Description: Created from EC2 Console
State Change: INSUFFICIENT_DATA -> ALARM
Reason for State Change: Threshold Crossed: 2 datapoints were less than the threshold (2.0).
The most recent datapoints: http:// 0.0425, 0.038363636363636364.
Timestamp: Thursday 14 March, 2013 22:20:11 UTC
AWS Account: xxxxxxxxxxxx
Threshold:
The alarm is in the ALARM state when the metric is LessThanThreshold 2.0 for 3600 seconds.
Monitored Metric:
MetricNamespace: AWS/EC2
MetricName: CPUUtilization
Dimensions: InstanceId = i-xxxxxxx
Period: 3600 seconds
Statistic: Average
Unit: not specified
State Change Actions:
OK:
ALARM: arn:aws:sns:us-east-1:xxxxxxxxxxxx:NotifyMe
INSUFFICIENT_DATA:
I'm confused as to why it enters the ALARM state after just 1 hour (3600s) when I set it to 3 hours (10800s). For my test, the instance had been stopped all day. Once I created the alarm I started it and didn't do anything with the instance. Does it take into account all those stopped hours when it calculates the avg CPU utilization over 3 hours?
I would like to have the alarm let the instance stay alive for the threshold of 3 hours before it stops the instance. Is there a better way to do this?
Best Answer
In your email it clearly states that your alarm is set to trigger after 3600 seconds.
There should be an option to set "EvaluationPeriods". What this does is it tells the alarm how many times to evaluate the specific metric you wish to check. So in your case you would set this to 3 and the alarm would check once every hour to see if the metric is LessThanThreshold 2.0. The alarm will trigger if for 3 consecutive hours the average of the 3 points taken is LessThanThreshold 2.0.
Another thing to note is that your alarm state went from INSUFFICIENT_DATA -> ALARM. I have noticed this activity with some alarms I am working on.
In my case:
To mitigate this I have set up a script so that whenever an instance is started the alarm is created with it and when ever an alarm is triggered it deletes itself after stopping the instance it is assigned to.