Let's go over your questions.
So basically I want my original instance to be running at all times. Then when it starts going over capacity I want the Auto Scaling Group to start launching instances and the Load Balancer to distribute the load across them. Is my thinking here sound?
I'd say yes, but I do have a couple reservations. If I understand correctly, you've placed your "main" instance outside of the auto scaling group. Theoretically, that would ensure that auto scaling doesn't kill off your original instance. There are a couple of things I'd like to mention:
- You're not making full use of the possibilities of Auto Scaling. Auto Scaling not only enables your setup to scale, but it can also ensure limits. If, for whatever reason, your "main" instance dies, your auto scaling policy won't come into action. If you keep your instance in an auto scaling group with a
min-size
of 1, Auto Scaling automatically replaces the failed instance.
- When auto scaling, it's often best practise to treat your instances as being "disposable", because that's how you build resilient systems. Don't depend on one instance to always be available.
- You can set the termination policy for your auto scaling group so that it always terminates the newest instances first. That would ensure your "main" instance will be kept (as long as it's healthy). My previous comment still applies though.
When I make code and data changes to my original instance, do I have to remake the image my Launch Configuration uses?
I'd say no, but that's more of a design issue. Your image should describe the state of your server, but it shouldn't be responsible for code distribution. Consider a situation where you'd have to update your application because of an urgent bug, but your servers are under high load. Does updating your main server, creating an AMI, updating your launch config and killing off your auto scaled servers so they can be respawned with the latest AMI sound like an attractive solution? My answer to that would be no (again). Look into source code version control and deployment strategies (I'm a Rails developer 60% of the time and use git
and capistrano
, for instance).
There are situations where your approach would work as well and there is a lot of middle ground here (I would recommend also looking into Chef
and userdata
scripts). I myself actually rarely use custom AMIs, thanks to Chef
.
What needs to be down with DNS names and IPs? I'm currently using Route 53, do I make that point to my Load Balancer and that's it?
Basically, yes. You can select the loadbalancer(s) that should be attached to new instances when creating your auto scaling group. I haven't used the GUI for Auto Scaling yet, but I'm quite sure it's in there somewhere. If not, the CLI still supports it. Point your Route53 record to your ELB alias
and that's basically it.
Response to additional questions (2014/02/23):
If you're developing using Rails, I can highly recommend Capistrano for deployments, which can take a specific version of your app in your preferred version control system (like git) and deploy it to a number of servers in a specific environment. There are a bunch of tutorials out there, but Ryan Bates' revised (and pro) Railscasts on the subject are very concise, although you need a subscription to his website to watch both of them. Most of the other tutorials will get you going as well though. Fire up a new server with your AMI and a launch script that pulls a temporary clone of your git repo and runs a local Capistrano command to get your app going. This ensures that, later on, you can also deploy new versions of your application using Capistrano with just one command to all running servers.
Capistrano also provides a couple of other benefits, including enabling you to execute specific tasks on all or just one of your servers and roll back a deployment, which is very hard to accomplish using just git.
Update
Our bid price is high, and I don't think the instances should have been terminated due to spot price (based on spot pricing history)
Spot price contention is not the only possible cause for an Amazon EC2 Spot Instance being terminated by AWS, another notable one is capacity contention:
- The capacity of available spot instances depends on the demand for regular instances, and if there aren't any instances of a specific type available for users requesting regular on demand instances, AWS will start terminating spot instances to fulfill those requests.
- In fact I've encountered that in
us-east-1
more often than elsewhere so far, and much more frequently in recent month for the new m3/c3/i3 instance type families (an understandable effect of ramping up capacity over time).
You can verify the actual cause of a spot request termination manually in the AWS Management Console or e.g. via the AWS CLI's describe-spot-instance-requests. For advanced spot instance usage I'd recommend to start Tracking Spot Requests with Bid Status Codes and correlate these with your instance terminations for the best operational insight. See the Life Cycle of a Spot Requests and the Spot Bid Status Code Reference for more details, specifically the following reasons for spot termination by AWS:
instance-terminated-by-price
The Spot Price rose above your bid price. If your request is a
persistent bid, the process—or life cycle—restarts and your bid will
again be pending evaluation.
instance-terminated-no-capacity
There is no longer any Spot capacity available for the instance.
instance-terminated-capacity-oversubscribed
Your instance was terminated because the number of Spot requests with
bid prices equal to or higher than your bid price has exceeded the
available capacity in this pool. This means that your instance was
interrupted even though the Spot Price may not have changed because
your bid was at the Spot Price.
instance-terminated-launch-group-constraint
One of the instances in your launch group was terminated, so the
launch group constraint is no longer fulfilled.
Initial Answer
"At 2014-05-07T18:06:45Z an instance was taken out of service in response to a system health-check."
This misleading message simply is the one reported when the Amazon EC2 Spot Instance has been terminated due to spot price contention, see e.g. the AWS team's response to Auto Scaling Message & Spot Instance Termination:
You are correct the instance was terminated due to spot pricing.
The instance terminated right before the health-check so it was taken out of service since it was still associated to the AS group.
While it escapes me why AWS hasn't managed to come up with a better integration between Auto Scaling and Amazon EC2 yet in this regard, it makes more sense when considering that these are two separate services in fact, so if the 'external' spot market backend terminates an EC2 instance, it will simply become 'unhealthy' from an Auto Scaling point of view - this is sort of documented in Obtaining Information About the Instances Launched by Auto Scaling:
- Cause: At 2012-06-01T00:47:51Z an instance was taken out of service in response to a system health-check. Description: Terminating EC2
instance: i-88ce28f1
Auto Scaling maintains the desired
number of instances by monitoring the health status of the instances
in the Auto Scaling group. When Auto Scaling receives notification
that an instance is unhealthy or terminated, Auto Scaling launches
another instance to take the place of the unhealthy instance. [...]
Note
Auto Scaling provides the cause of instance termination that is not the result of a scaling activity. This includes instances that
have been terminated because the Spot Price exceeded their bid price. [emphasis mine]
Best Answer
You can only use public AMIs and private AMIs that were explicitly shared with you. If you used a private AMI that is no longer shared with you, then you can no longer start instances with that AMI. In the future, if you intend to use a private AMI that was shared with you, you should copy the AMI first, in case the original is unshared or deleted, then use your private copy of the AMI to start instances.