If I release a new Docker image with a bug to my ECS Service, then the service will attempt to start new Tasks but will keep the old version around if the new tasks fail to start.
In that scenario, it will sometimes (not always) emit an Event to the bus like:
service xxx is unable to consistently start tasks successfully. For more information, see the Troubleshooting section.
and sometimes it will just emit loads of events like:
service xxx deregistered 1 targets in target-group yyy
I would like a CloudWatch Alarm to fire in this scenario. How can I achieve that?
I cannot see any CloudWatch metrics that track any relevant events that I could use to trigger this Alarm. https://docs.aws.amazon.com/AmazonECS/latest/developerguide/cloudwatch-metrics.html
If the Tasks fail to boot then I don't even get any UnHealthyHostCount metrics on the LB Target Group.
I think I will have to create an EventBridge rule to watch for the above named event, but I can't see an obvious way to have that rule trigger an Alarm. I have set a rule to forward "WARN" and "ERROR" events to SNS/email, but I don't always get these events. So I frequently get a restart loop with no alarms firing. 🙁
Best Answer
I have the following infrastructure which I think addresses this requirement:
AWS/ApplicationELB
/UnHealthyHostCount
which sometimes firesAWS/Events
/TriggeredRules
which fires when 2 or 3 occursThis is quite a messy approach, but the best I could find. I am disappointed that ECS doesn't publish metrics to track this common case.
(I do not subscribe anything to the SNS topics created above; they exist solely to make the above rules valid. The events are viewable in the ECS console if required.)