Monitoring / metric collection for system collectives that change a lot in time (a.k.a. cloud)

cloudmetricsmonitoring

When your server fleet doesn't change a lot in time, like when you're using bare-metal hosting, classic monitoring and metric collection solutions (Nagios, Munin) work well.

But if the number of systems varies a lot in time, and may in fact vary rapidly, classic software is more difficult to setup and use. E.g., trying to make Nagios (monitoring) keep up with a rapidly evolving cloud infrastructure can be cumbersome. Same for Munin (metric collection). It's not just the configuration, but the way the information is conveyed to the user, or displayed, is inadequate for the cloud.

What are some possible alternatives that work well with the cloud? The goals are to collect and display metrics (analog to Munin), and generate alerts when certain metrics go out of bounds or when certain services are unavailable (analog to Nagios), and do everything in a cloud-friendly manner.

Some cloud providers offer monitoring / metric collection as services, but not always, and if you use more than one provider you don't want to become too dependent of just one vendor. So provider-independent solutions are required.

EDIT: I am asking this question in a general fashion – not limited to any given cloud infrastructure (like OpenStack), but in the general case of using arbitrary cloud providers.

Best Answer

For systems that are short-lived or where the infrastructure changes often, I use two different tools to handle monitoring. I added a comment asking which metrics were most important to you, and it seems like you're looking for basic "what happened when?" monitoring stats with some alerting...

As systems and hardware are abstracted more via cloud services and virtualization, some of the traditional monitoring tools are less useful because you may not care about physical hardware resources and health. Application and virtual resources (from the perspective of the VM/instance/container) are what matter.

Both of the examples I give below are entirely hands-off and a default in my environments. Reinforced by Puppet, I can ensure that all systems are capturing and reporting their performance.

Pick #1 - New Relic

New Relic monitoring is agent based and quite easy to slipstream into a provisioning or configuration management system. In my case, every server I deploy gets a Puppetized New Relic configuration, registers itself with my New Relic account and is available in the monitoring dashboard around ~30-60 seconds from install. The host pushed data over standard ports, so this works well across environments. The system can unregister itself on teardown.

Main positives are 60-second granularity, live dashboard/kiosk view, it's free for server monitoring and is clean and presentable in a manner acceptable to end-users and clients.

Pick #2 - Monit and M/Monit

Monit is incredibly handy for application and basic system monitoring. Monit is an agent that is easily installed on target systems via native OS package management. It can be tailored to monitor custom applications and their relevant parameters, as well as taking actions based on those metrics. M/Monit adds a degree of centralization to the Monit checks, and allows you to aggregate data for analysis and light graphing.

Being agent-based, it's also easy to push configs to hosts in an automated fashion. I also use Puppet for this, with some creative tempting to build the confutations files. Upon initialization, new servers will register with the central M/Monit daemon over http/https ports, so firewalls and monitoring of multiple locations is not an issue.

Related Solutions

Is Zabbix the Right Monitoring Tool?

I think it would be best to concentrate on answering the specific questions you had, taking into account the size of your planned deployment (~10 monitored hosts).

What are the general disadvantages of Zabbix?
- it won't automagically figure out what to monitor, when to alert you and etc - you will have to think about what metrics you are interested in and configure them upfront
- debugging leaves something to be desired. although with such a small environment help options like forum, irc channel etc should suffice easily
Does Zabbix have a small footprint on boxes it is monitoring?

Yes, definitely. Zabbix can monitor using methods like SNMP, simple network checks (is a port open?), and it also has native agent for many platforms. As the agent is written in C, it has an extremely small footprint (as opposed to bunch of interpreted scripts...). You can easily combine different checks on a single monitored host. Note that you are not limited to monitoring servers, you can also add network devices and other things.
Do I really need to setup an entire other server for it? I currently have a server that is under very light load -- can I dual purpose it?

Depends - if it's running one of the supported operating systems for the server - definitely. For that environment requirements will be really low. Make sure to use default templates only as a guideline, it's suggested to create your own with longer intervals between checks. Basically, Zabbix consists of 3 components - DB, frontend, server. If you desire so, you can reuse existing database server and existing webserver in the company for the first two components, and then run Zabbix server on any supported platform - that's a perfectly valid configuration.

Any specific queries would be very welcome in #zabbix on Freenode.

Monitoring AWS Systems Behind ElasticBeanStalk

If you are deploying a WAR on elastic beanstalk you can install the metrics by creating a configuration file in the .ebextensions folder under WEB-INF. See the following link for more information on configuring and instance using this: - http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-containers.html

To install disk / memory metrics you need to install the "Amazon CloudWatch Monitoring Scripts for Linux" - see http://aws.amazon.com/code/8720044071969977

files:
  "/opt/aws/cwms/CloudWatchMonitoringScripts.zip":
    mode: "000777"
    owner: ec2-user
    group: ec2-user
    source:  http://ec2-downloads.s3.amazonaws.com/cloudwatch-samples/CloudWatchMonitoringScripts-v1.1.0.zip
container_commands:
  01_unzip_cloud_watch_zip: 
    command: unzip -d /opt/aws/cwms /opt/aws/cwms/CloudWatchMonitoringScripts.zip
    ignoreErrors: true
  02_update_password_file:
    command: sed -i 's/Key=$/Key=<VALUE OF YOUR SECRET KEY>/;s/KeyId=$/KeyId=<VALUE OF YOUR ACCESS ID>/' /opt/aws/cwms/awscreds.conf
  03_update_crontab:    
    command: echo "*/1 * * * * /opt/aws/cwms/mon-put-instance-data.pl --mem-util --disk-path=/ --disk-space-util --from-cron" | crontab - -u ec2-user

Basically what this script does is download the Linux based CloudWatchMonitoringScripts.zip into a folder such as /opt/aws/cwms (this can be anywhere). The commands then unzip the file, update the access / secret key (using the "sed" command) and finally creating the crontab tab.

Be careful of the crontab tab section, as it could potentially wipe you existing crontab entries.

UPDATE (FEB 2016)

Here's an updated script which is working for me quite nicely as of Feb 2016 (see http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-containers-cw.html).

sources: 
  /opt/cloudwatch: http://ec2-downloads.s3.amazonaws.com/cloudwatch-samples/CloudWatchMonitoringScripts-v1.1.0.zip

commands:
  00-installpackages:
    command: yum install -y perl-Switch perl-Sys-Syslog perl-LWP-Protocol-https

container_commands:
  01-setupcron:
    command: |
      echo '* * * * * root perl /opt/cloudwatch/aws-scripts-mon/mon-put-instance-data.pl `{"Fn::GetOptionSetting" : { "OptionName" : "CloudWatchMetrics", "DefaultValue" : "--mem-used --memory-units=megabytes --mem-util --disk-space-util --disk-space-used --disk-space-avail --disk-path=/" }}` >> /var/log/cwpump.log 2>&1' > /etc/cron.d/cwpump
  02-changeperm:
    command: chmod 644 /etc/cron.d/cwpump
  03-changeperm:
    command: chmod u+x /opt/cloudwatch/aws-scripts-mon/mon-put-instance-data.pl

option_settings:
  "aws:autoscaling:launchconfiguration" :
    IamInstanceProfile : "MonitorRole"
  "aws:elasticbeanstalk:customoption" :
    CloudWatchMetrics : "--mem-used --memory-units=megabytes --mem-util --disk-space-util --disk-space-used --disk-space-avail --disk-path=/"

NOTE: You must have an IAM role called MonitorRule in place. It's role policy should be as follows (also see http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-containers-cw.html):-

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "cloudwatch:PutMetricData",
        "ec2:DescribeTags"
      ],
      "Effect": "Allow",
      "Resource": [
        "*"
      ]
    }
  ]
}

Best Answer

Related Solutions

Is Zabbix the Right Monitoring Tool?

Monitoring AWS Systems Behind ElasticBeanStalk

Related Topic