We have 2 auto scaling groups (one for on-demand and one for spot instances) which are both set to a static number of instances (min, max, and desired are all the same – 5 in our case). The instances in the on-demand group stay running, but the ones in the spot group are frequently terminated due to a system health check. The message shown for a terminated instance in the Scaling History tab in the EC2 Management Console is e.g.:
"At 2014-05-07T18:06:45Z an instance was taken out of service in
response to a system health-check."
I don't know why our spot instances are failing a health check. Our bid price is high, and I don't think the instances should have been terminated due to spot price (based on spot pricing history). I've adjusted the AZs that the instances are launched in also, and I don't see a difference. I don't see any suspicious messages when I check the syslog of a recently terminated instance. We're using a private/custom AMI for both groups, but I see the same behavior when I switch to a more generic AMI (the "Ubuntu 12.04 LTS Precise EBS boot" image listed on alestic.com – ami-5db4a934). Again, our on-demand instances stay running and don't fail health checks. We're using the "EC2" health check type.
Here is the command we're using to create our launch configuration via the AWS CLI:
aws autoscaling create-launch-configuration \
--launch-configuration-name [name] \
--image-id ami-5db4a934 \
--key-name [our key] \
--security-groups [our SGs] \
--instance-type m3.xlarge \
--block-device-mappings '[ { "DeviceName": "/dev/sda1", "Ebs": { "VolumeSize": 8 } } ]' \
--spot-price "1.00"
Does anyone know what this might be or how we can get more visibility into why the spot instances are failing health checks?
Best Answer
Update
Spot price contention is not the only possible cause for an Amazon EC2 Spot Instance being terminated by AWS, another notable one is capacity contention:
us-east-1
more often than elsewhere so far, and much more frequently in recent month for the new m3/c3/i3 instance type families (an understandable effect of ramping up capacity over time).You can verify the actual cause of a spot request termination manually in the AWS Management Console or e.g. via the AWS CLI's describe-spot-instance-requests. For advanced spot instance usage I'd recommend to start Tracking Spot Requests with Bid Status Codes and correlate these with your instance terminations for the best operational insight. See the Life Cycle of a Spot Requests and the Spot Bid Status Code Reference for more details, specifically the following reasons for spot termination by AWS:
instance-terminated-by-price
instance-terminated-no-capacity
instance-terminated-capacity-oversubscribed
instance-terminated-launch-group-constraint
Initial Answer
This misleading message simply is the one reported when the Amazon EC2 Spot Instance has been terminated due to spot price contention, see e.g. the AWS team's response to Auto Scaling Message & Spot Instance Termination:
While it escapes me why AWS hasn't managed to come up with a better integration between Auto Scaling and Amazon EC2 yet in this regard, it makes more sense when considering that these are two separate services in fact, so if the 'external' spot market backend terminates an EC2 instance, it will simply become 'unhealthy' from an Auto Scaling point of view - this is sort of documented in Obtaining Information About the Instances Launched by Auto Scaling: