Auto scaling on EC2 is based on triggers from Cloudwatch. By default, Cloudwatch does not collect data about memory usage (the official reason being something to the effect of such metrics requiring 'a look into the OS running in the instance')
The solution, therefore, is to setup a custom metric to monitor memory usage attach an alarm to that metric, and then base your scaling policy off that alarm.
Amazon has described the procedure fairly well in this forum post.
Firstly, you have a script that will gather the data from 'free' (copied from the above page):
#!/bin/bash
export AWS_CLOUDWATCH_HOME=/home/ec2-user/CloudWatch-1.0.12.1
export AWS_CREDENTIAL_FILE=$AWS_CLOUDWATCH_HOME/credentials
export AWS_CLOUDWATCH_URL=https://monitoring.amazonaws.com
export PATH=$AWS_CLOUDWATCH_HOME/bin:$PATH
export JAVA_HOME=/usr/lib/jvm/jre
# get ec2 instance id
instanceid=`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`
memtotal=`free -m | grep 'Mem' | tr -s ' ' | cut -d ' ' -f 2`
memfree=`free -m | grep 'buffers/cache' | tr -s ' ' | cut -d ' ' -f 4`
let "memused=100-memfree*100/memtotal"
mon-put-data --metric-name "FreeMemoryMBytes" --namespace "System/Linux" --dimensions "InstanceId=$instanceid" --value "$memfree" --unit "Megabytes"
mon-put-data --metric-name "UsedMemoryPercent" --namespace "System/Linux" --dimensions "InstanceId=$instanceid" --value "$memused" --unit "Percent"
The script takes the number from the '-/+ buffers/cache' row under the 'free' column, as a percent of 'total' (under the 'Mem' row), and sets up 2 metrics - the percent of memory used, and the total memory free in MB.
All of the AWS API tools are very slow (relatively speaking) - if possible, use the API directly from some supported language (e.g. Ruby) and you will get much better performance than the script above.
Modify the above script to suit your needs (you probably don't need both metrics, etc) and set it up to run every few minutes via cron. Keep in mind that you get a limited number of custom/detailed metrics and alarms for free, after which their is a monthly cost.
There is also a Google Code project - 'Aws Missing Tools' that has scripts for monitoring memory usage and a few other metrics that may be helpful.
Once you have your metric setup and functioning, create an alarm for it and proceed with autoscaling (as-put-scaling-policy
, etc) as you would for any of the pre-defined metrics.
So Centos AMI does not include CloudInit
service by default (some of Ubuntu and Debian have it by default). You need to install it on your AMI, start the service on the boot:
chkconfig cloud-init on
Update the configuration file as needed: /etc/cloud/cloud.cfg
Then you need to create a new AMI of the one modified.
To test the bootstrap script the easiest I've found is to start a micro instance of this AMI specifying the --user-data-file
option.
Best Answer
The problem is that your Tomcat servers (and most likely your workers) don't know about the RabbitMQ server. You need to do 1 of 2 things in this scenario: (a) Tell them about the new server, or (b) Make it so that they don't care
For (a) above, you could notify each Tomcat server and worker when your new RabbitMQ server start, or put the info in some list that your other components references.
However, in this scenario, assuming you have a queue on RabbitMQ #1, what happens to that queue if you start RabbitMQ #2? You'll actually have 2 queues in this case, not a single queue spanning 2 servers. Does your application handle this?
For (b) above, you can take a look at RabbitMQ Clustering . My understanding is that with RabbitMQ clustering, you can have nodes come and go, and the clients shouldn't care.