Moving from Rackspace to EC2 – EBS vs. instance store

amazon ec2

I just started using S3 for image storage for my Django app which is hosted on Rackspace. I had to look at EC2 as it's obvious;y going to be quicker to move files to S3 as they are uploaded. I must admit it has got me confused.

So if I use instance store for front ends where does my django code reside? It says that if it reboots I lose my data adn I assume my code is also my data. Do I mount an EBS volume to store code?

For MySQL backend do I use EBS backed instance and instance-store for front ends I am going to start and stop as load comes on and off.

How do you load balance when bringing up new front ends and shutting them down. I was using nginx before but I am guessing Amazon have somehting that makes this redundant.

All the EC2 management commands I see people mention are run where? On the instance itself or on my windows machine I used to develop and test?

Best Answer

There are 2 kinds of storage that can be attached to an instance: ephemeral and elastic block storage (EBS). Ephemeral storage is, as the name suggests, temporary - and only exists while the instance is running (it is destroyed if the instance it stopped or terminated). Ephemeral storage cannot be 'transferred' between instances (i.e. detached from one instance and attached to another). Instance-store data does persist across reboots that do not stop the instance (i.e. running reboot from the console). EBS storage persists independently of the instance, and EBS volumes can be transferred between instances (within the same availability zone). Additionally, you can take snapshots of EBS volumes which allow for differential (i.e. delta) storage of compressed images of the volume content (allowing you to easily create a new volume in a different region, etc). Instances that are eligible for ephemeral storage, receive it at no additional cost. EBS storage, on the other hand, is billed by both a) the amount provisioned, and b) the I/O usage.

Instance-store uses S3 to store the AMI data, and provides an ephemeral root as well as additional ephemeral storage. This can be good in the case where a task requires lots of temporary storage, with very little data being permanently retained. In general, ephemeral storage is good for temp files and swap space. If you are storing data on ephemeral volumes, it should probably be copied from a 'master source' at startup, and the data (on the ephemeral disk) should not be of value. (For instance, if you are running an application, you could store your code externally download the latest version to the ephemeral disk when the server starts, and then run the application locally on the server, with all data stored elsewhere (e.g. EBS, RDS, etc.))

Typically, the recommended route is to use EBS - both as the root volume and as your storage medium for data. EBS allows you to easily change instance types, modify root partition size, backup your data, and greatly facilitates dealing with problems as you can attach the EBS volumes to other running instances. Using EBS, you can store your code directly on the EBS volume, and any changes you make will persist. Moreover, you can attach a snapshot to an instance, such that all instances launched will have the same data on their attached EBS volume (i.e. the data is derived from the snapshot). Unless you have specific needs that would benefit from the instance-store architecture, EBS is the way to go.

(There is another option, but for code/databases it is typically not practical. You can mount an S3 bucket as a local file system using fuse - the advantage is unlimited, unprovisioned storage (i.e. your files can grow without having to pre-allocated an amount of space) - potentially, this is great for uploads, pictures, etc. that might be user contributed. The downside is performance - writing to S3 is not nearly as fast as EBS or ephemeral storage, there is a significant lag that makes it unacceptable for the core components of an application.)

Recap: code must be locally available - either store it on EBS (recommended), or download it to the server's ephemeral disks (i.e. your code is copied to the machine for use, but resides elsewhere).

For MySQL you want an EBS volume if you are going to manage your own MySQL or you can use Amazon's RDS. Some people have noted that you get better performance at a lower cost running your own MySQL server. The reason for EBS here is that it will be extremely difficult to maintain an up-to-date backup of continually changing databases run off ephemeral disks. This means that if you used ephemeral disks and the instance crashed, you would lose all data since the last backup, which is generally unacceptable. Databases cannot practically be stored on S3 as the performance is insufficient for that purpose.

AWS provides 'elastic load balancers' that will distribute the load between instances that you have associated with them. It is capable of distributing load between regions, and tries to avoid single points of failure and the limitations (e.g. network I/O) of single instances. It does not support a 'static' IP (called an elastic IP on AWS), so you must use a CNAME to access it (i.e. you cannot map an ELB to your root domain). Also, the source IP is usually set as the ELB IP which means you will need to use the 'X-Forwarded-For' header for logs/analysis. You can still use nginx or HAproxy as a load balancer if you desire, however, keep in mind that this results in all the network traffic passing through that single instance, which will often end up as a bottleneck if your application requires high bandwidth. As with everything else on AWS though, you pay for what you use - ELBs are billed both on the time they are running, and the data that passes through them.

Finally, the AWS commands can be run from anywhere - they are keyed into your account by the credentials you pass them, and are run against specific resources (e.g. an instance, an EBS volume, etc) as you will specify the associated ID for a given command. Only a few commands (such as bundling/uploading a AMI), which require (local) access to the files in question must be run from a specific machine (i.e. one that has access to the required files). Even commands that reference a resource attached to an instance (e.g. taking a snapshot of an EBS volume) can be run from any machine that has the tools installed (the instance in question, your dev box, another instance, etc.)

Best Answer

Related Solutions

How to create an instance store based AMI from an EBS instance

Mysql – Running MySQL on Amazon EC2 with EBS (Elastic Block Store) and Elastic Beanstalk

Related Topic