I don't use Elastic Beanstalk - but the guide you are following is for EC2 (which I can definitely help with). The first difficulty you have is that the guide you are using is for Ubuntu 9.10; Amazon's Linux is based on CentOS/RHEL - so you would have an easier time if you could find a CentOS 6 guide.
The root of your issue seems to stem from 'attaching an EBS volume'. On EC2 you can attach multiple EBS volumes to a single instance. All instances have a root volume - these can be either S3-backed or EBS-backed. By far, the preferred approach is to use an EBS backed root volume (it costs a bit more, but makes up for it in flexibility and durability). An instance with an EBS root volume will almost always have this volume attached as /dev/sda1 - on modern Linux systems, the device actually shows up as /dev/xvda1 (and it is the latter which should be passed to any commands). (Other than trying to format a mounted volume - you were trying to format your root file system with the instance running - i.e. you were trying to erase your operating system, definitely not a good idea, if it is even possible).
In this case, the suggestion is to add a second EBS volume - attach it to your instance (e.g. as /dev/sdh, but use /dev/xvdh for commands), and use that for storing your MySQL data. (Despite not using Elastic Beanstalk) I find it hard to believe that Elastic Beanstalk would not allow you to attach a second volume - as this functionality is fairly central to EC2.
You should be able to get a list of the EBS devices by running cat /proc/partitions
(or using fdisk -l
).
You will note that in step 5 of what you have done, you are actually mounting the root volume within itself (i.e. /dev/sda1 is already mounted as / and you are mounting /dev/sda1 as /ebsvol) - it is best to avoid doing that.
Also, while /etc/init.d/mysql stop
did not work, /etc/init.d/mysqld stop
probably would have worked. (Again, you can get a list of the init.d scripts by running ls /etc/init.d
- and should be able to use those paths, like you, I usually use the service
command though).
The MySQL databases should be in /var/lib/mysql - however, your mountpoints in /etc/fstab are probably incorrect (given the ebsvol within /ebsvol problem). When you cd /var/lib/mysql
you should be able to see your databases - if not your mounts haven't worked correctly. (Verify that /var/lib/mysql is mounted on a different device by mountpoint -d /var/lib/mysql
and compare the device to cat /proc/partitions
).
The basic ideas of the guide you are following are quite valid - it is common practise to put your data and databases on a different EBS volume than your root volume, as it offers numerous advantages (performance, ease of snapshotting, easier to move between instances, etc.), and the basic Linux commands haven't changed - they are just for a Ubuntu.
Undo your mounts with umount /path
- just like you normally would, of course, you will need to ensure that the device is not busy (which may not be a problem if you haven't managed to start MySQL). umount is only temporary though - so you will have to edit /etc/fstab
and remove any references to the mount points from there also. If you don't have anything of value on the instance, you might be better off starting over (not because it is difficult to unmount a few volumes, but rather because it is always easier to figure out where you went wrong when you start from a known state).
Finally, with regard to MySQL on Elastic Beanstalk: the point of Elastic Beanstalk is supposed to be that it handles provisioning of resources and scaling automatically - it is still based on the core AWS components (e.g. EC2, S3, ELB, etc) but it will do some things for you. Elastic Beanstalk usually uses RDS to handle MySQL databases. RDS is an Amazon managed version of MySQL which simplifies the provisioning and scaling of MySQL instances. Keep in mind that MySQL doesn't lend itself well to autoscaling without a lot of setup. You can't just launch a second MySQL instance and have the load split between your two instances - you need to setup replication, which may not be a simple task).
Essentially, if you are able to setup MySQL in such a way that it runs from your web server instances and can autoscale seamlessly, you'd almost certainly be better off using EC2 directly and not bothering with Elastic Beanstalk. I'd suggest therefore, the most people don't actually setup MySQL on Elastic Beanstalk (what you could do is setup a separate MySQL instance, but if you are using Beanstalk, RDS is probably a simpler approach).
Edit:
Unlike a lot of other services that operate mostly as a black box, Elastic Beanstalk does give you access to the underlying components. That said, if you are going to go through the effort of setting up your EC2 instances manually, you have negated the point of Elastic Beanstalk.
If you are using EC2, there are few approaches to PHP/MySQL:
- You can host both your webserver and database on a single instance - when you are starting out, this can be a reasonable approach, however, it doesn't scale horizontally very well (but you can still scale vertically - using larger instances). Hopefully by the time you exceed the capacity of the x-large instances, you will be in a position to setup a more complex setup. That said, it is bad for redundancy - everything is on that single instance, and a failure of any component takes down your whole setup.
- You can host your webserver on one instance, and use RDS for your database. Most well designed applications will tax the web server more than the database (and the database load will ideally be read-biased). In such a scenario, you can scale your web server instances relatively easily (e.g. by putting them behind an ELB - with just a bit of effort to ensure that all are serving the same content). RDS is MySQL managed by AWS - it isn't quite fully automatic, but it does go a long way towards autoscaling. Essentially, RDS will provision multiple read-only slaves, and a single write-master, with multiple hot-backups that can take over if you need. The downside is that you are paying for all those instances that are running (and you don't have full control over some of the intricate settings of MySQL).
- The final approach would be to use your web server cluster and your own MySQL cluster. Essentially, you can scale your web instances (as above), and then you will setup MySQL instances that will scale separately. You will need to look into MySQL replication (or perhaps use MySQL cluster if you can adapt your application to its data structures).
A few other answers on the same topic:
My perspective is usually that one click solutions aren't the best approach - I like the control that is offered by doing something manually. I find that not only do I usually end up with a more tailored and efficient end result, but I also have a much better understanding of how the system works, which makes figuring out what is wrong much easier. You can always automate your own setups once you have a good understanding of the intricacies of them.
One point to keep in mind about RDS - it is already EBS backed. RDS is MySQL - it isn't something similar, or another relational database. It is a managed instance of MySQL running on EBS backed EC2 instances. AWS will keep the software up to date, and you can do normal EBS snapshots of your data, etc. You just don't have direct access to the underlying software running on the instance.
As for the choice of operating system, I am partial to Amazon's Linux. It is well supported by AWS and uses a minimum of resources - it is fully compatible with CentOS (as a matter of fact, it includes the EPEL repository by default in the latest version). The usual viewpoint is to use whatever Linux distribution you are comfortable with, as the differences are usually minor (CentOS will work just as well as Ubuntu for the instructions you are working from - most commands (except apt-get) are the same on CentOS. Given that my own setup has the databases on a separate EBS volume using Amazon's Linux, I can assure you that it is not difficult to do).
I'd suggest that there are some main considerations:
- Comfortability with/willingness to learn Linux systems - if you don't mind setting up your own servers and want to get a better understanding of them, I'd definitely go the EC2 route. You'll end up with a better end result if you do it right and will have more versatility in the long run. I will mention though, that if you are taking this approach, you want to really understand what the commands you are running do - just following a guide will not be enough if you really want to commit to it.
- Budget - remember that with AWS everything has a price. The more AWS does for you, the more they charge you. An RDS instance costs about 30% more than an equivalent EC2 instance (and there is not micro instance) and if you want the redundancy they offer, you need to be running multiple RDS instances (and paying for each of those). Elastic Beanstalk will provision instances, load balancers, RDS instances, etc. for you - the costs add up quickly.
- Time - if you have no time, want to press a couple of buttons and have something functional, Elastic Beanstalk is probably the best approach for you.
I would advise against using Elastic Beanstalk with MySQL baked into your AMI - it will likely be quite unstable, if it works at all. (Just think about what happens when it adds and removes an instance to you cluster, or when data goes to one instance instead of the other...)
It is great to keep scalability in mind - but don't optimize things too soon, or you will never get anything done. Definitely keep it in mind, but if the cost (time, money, etc.) of making a particular component scalable is not practical at the moment, don't worry too much about it - when the time comes to scale it, you'll figure it out (most popular sites started out that way, afterall).
I'd advise that if your application is designed so that it can take advantage of some caching, it will go a long way.
Typically, on EC2 it is better to scale vertically (to larger instances) than horizontally (to more instances). To begin with however, you want to scale to two instances so that you have some redundancy and minimize your single points of failure. A possible approach, therefore, may be:
- Start with a micro instance - have both your database and application on it (you can't get any smaller than this, which makes it a good starting point).
- This is of course, quite easy to scale vertically, just keep upgrading your instance until you are using x-large instances. The problem comes down to redundancy - if there is any problem with your instance, your application is offline.
- Now, you usually want to separate of your database to another instance (since a) the database will see different load than your application and b) you can't autoscale MySQL in quite the same way as web servers), but micro instances just don't handle load well, so I'd suggest upgrading to a larger instance first, at least a small, and then, perhaps a medium (basically, the idea is that once you need larger instance types, the effect is presumably greater)
- Separate your database from your web server. This will allow you to cater to the different needs for databases (e.g. high memory) vs web servers (e.g. higher cpu) and the differences between how you scale each (Recommended reading). At this point you might decide to use RDS instead of running your own MySQL instance.
- Now that you have your application running on a dedicated instance, you can scale it and not worry about your database - setup autoscaling so that you have some redundancy. This should automatically add more application nodes as any of them fail or as load exceeds the thresholds you specify.
- Add a second database node and configure replication between your nodes (if you opt to use MySQL cluster, or NoSQL solutions, you should be able to setup autoscaling as well). Everything should at this point have redundancy, and even if a node fails, you should still be online.
- Upgrade one instance at a time to larger instance sizes as demand merits it.
Best Answer
There are 2 kinds of storage that can be attached to an instance: ephemeral and elastic block storage (EBS). Ephemeral storage is, as the name suggests, temporary - and only exists while the instance is running (it is destroyed if the instance it stopped or terminated). Ephemeral storage cannot be 'transferred' between instances (i.e. detached from one instance and attached to another). Instance-store data does persist across reboots that do not stop the instance (i.e. running reboot from the console). EBS storage persists independently of the instance, and EBS volumes can be transferred between instances (within the same availability zone). Additionally, you can take snapshots of EBS volumes which allow for differential (i.e. delta) storage of compressed images of the volume content (allowing you to easily create a new volume in a different region, etc). Instances that are eligible for ephemeral storage, receive it at no additional cost. EBS storage, on the other hand, is billed by both a) the amount provisioned, and b) the I/O usage.
Instance-store uses S3 to store the AMI data, and provides an ephemeral root as well as additional ephemeral storage. This can be good in the case where a task requires lots of temporary storage, with very little data being permanently retained. In general, ephemeral storage is good for temp files and swap space. If you are storing data on ephemeral volumes, it should probably be copied from a 'master source' at startup, and the data (on the ephemeral disk) should not be of value. (For instance, if you are running an application, you could store your code externally download the latest version to the ephemeral disk when the server starts, and then run the application locally on the server, with all data stored elsewhere (e.g. EBS, RDS, etc.))
Typically, the recommended route is to use EBS - both as the root volume and as your storage medium for data. EBS allows you to easily change instance types, modify root partition size, backup your data, and greatly facilitates dealing with problems as you can attach the EBS volumes to other running instances. Using EBS, you can store your code directly on the EBS volume, and any changes you make will persist. Moreover, you can attach a snapshot to an instance, such that all instances launched will have the same data on their attached EBS volume (i.e. the data is derived from the snapshot). Unless you have specific needs that would benefit from the instance-store architecture, EBS is the way to go.
(There is another option, but for code/databases it is typically not practical. You can mount an S3 bucket as a local file system using fuse - the advantage is unlimited, unprovisioned storage (i.e. your files can grow without having to pre-allocated an amount of space) - potentially, this is great for uploads, pictures, etc. that might be user contributed. The downside is performance - writing to S3 is not nearly as fast as EBS or ephemeral storage, there is a significant lag that makes it unacceptable for the core components of an application.)
Recap: code must be locally available - either store it on EBS (recommended), or download it to the server's ephemeral disks (i.e. your code is copied to the machine for use, but resides elsewhere).
For MySQL you want an EBS volume if you are going to manage your own MySQL or you can use Amazon's RDS. Some people have noted that you get better performance at a lower cost running your own MySQL server. The reason for EBS here is that it will be extremely difficult to maintain an up-to-date backup of continually changing databases run off ephemeral disks. This means that if you used ephemeral disks and the instance crashed, you would lose all data since the last backup, which is generally unacceptable. Databases cannot practically be stored on S3 as the performance is insufficient for that purpose.
AWS provides 'elastic load balancers' that will distribute the load between instances that you have associated with them. It is capable of distributing load between regions, and tries to avoid single points of failure and the limitations (e.g. network I/O) of single instances. It does not support a 'static' IP (called an elastic IP on AWS), so you must use a CNAME to access it (i.e. you cannot map an ELB to your root domain). Also, the source IP is usually set as the ELB IP which means you will need to use the 'X-Forwarded-For' header for logs/analysis. You can still use nginx or HAproxy as a load balancer if you desire, however, keep in mind that this results in all the network traffic passing through that single instance, which will often end up as a bottleneck if your application requires high bandwidth. As with everything else on AWS though, you pay for what you use - ELBs are billed both on the time they are running, and the data that passes through them.
Finally, the AWS commands can be run from anywhere - they are keyed into your account by the credentials you pass them, and are run against specific resources (e.g. an instance, an EBS volume, etc) as you will specify the associated ID for a given command. Only a few commands (such as bundling/uploading a AMI), which require (local) access to the files in question must be run from a specific machine (i.e. one that has access to the required files). Even commands that reference a resource attached to an instance (e.g. taking a snapshot of an EBS volume) can be run from any machine that has the tools installed (the instance in question, your dev box, another instance, etc.)