Centos – Howto monitor web server nodes through HA proxy using Icinga/Nagios in CentOS 6

centoshaproxyicinganagios

In this guide I'm not focusing at all on setting up HA proxy or why you'd want to do that. If you have it and want to properly monitor it using Icinga, here's an idea of how you could go about this.

So here is a potential scenario:

2 data centers A and B
1 HA Proxy node per data center
Each HA proxy points to 2 web servers in each data center A1, A2, B1, B2
The web servers in this scenario are really a Web Service endpoint and a simple HTTP GET to a URL doesn't tell you that much about the actual health of the system

Monitoring wise you could settle for an external check (like pingdom or whatever) of your currently active nodes. That would have some implications though:

You would not be testing passive nodes which means before a node switch you're not really sure if the passive nodes are working
A failure of one node will not give you a clear indication of what is wrong

So here is a paranoid persons approach:

I want to monitor each node all the way through from the external IP(s), through HA proxy and into the system to catch any glitch along the way
I want to make an actual Web Service call to the back end service to verify that it's working – obviously not applicable if you're testing a normal web site

Lets get to it then…

Best Answer

First of all you'll need to enable cookie inserting in haproxy and assign each back end node its unique key. This is usually used for for session stickiness - i.e. you wan't someone visiting your site to always get the same back end node if it's still available. But it can also be used to monitor individual nodes by sending the appropriate cookie. So if not present add cookies to your haproxy server definitions:

cookie SERVERID insert indirect nocache
server webA1 10.0.0.1:80 cookie S1 
server WebA2 10.0.0.2:80 cookie S2

Secondly you will need to figure out what makes the most sense to check, on this you'll need to do some thinking and fiddling on your own to figure out what makes the most sense and how to check that using nagios's awesome check_http. For completeness I'll give a complex example below of how you could test a POST toward a back-end Web Service. For this example scenario the requirements are:

 - Post data should be <echo>Hello</echo>
 - A successful execution will return the echo string back
 - Disable any cache through HTTP headers
 - Set content-type to text/xml and expect the same back
 - SSL should be used
 - Host name is example.com
 - Port is 443
 - URI is /service
 - Max response time is 3 seconds

This would be taken care of by the following arguments to check_http (/usr/lib64/nagios/plugins/check_http on Cent OS 6)

-P "<echo>Hello</echo>"
-r 'Hello'
-k "Cache-Control: no-cache" -k "Pragma: no-cache"
-k "Content-Type: text/xml; charset=UTF-8" -k "Accept: text/xml"
-S 
-H example.com
-p 443
-u /service
-t 3

Now, this all put together should give you a nice OK output, get this working first.

Then it's time for some custom aspects enabling node selection through the cookie and also optionally sending in of an IP you can use to override DNS in case you for example want to check a path through a passive data center. To do this we'll write a small shell script wrapper around check_http that will take one parameter as the host-name of the back end node (for convenience, lets use what icinga considers the host name to be) and an optional parameter overriding the IP of the server to check (bypassing DNS lookup). This all results in a shell script looking something like this (I suggest putting it in /usr/lib64/nagios/plugins/ and chown,chmod it as per the other plugins in there):

#/bin/bash

if [  -z "$1" ]
  then
    echo "Usage: $0 host-name [haproxy-ip]"
  exit 2
fi

if [[ $# -eq 2 ]]; then
    APPEND_OPTS=" -I $2"
fi

#Map icinga/nagios host names to haproxy node names in case these differ and you don't want to expose them on the internetz
declare -A nodes
nodes=(["webA1"]="S1"
        ["webA2"]="S2"
        ["webB1"]="S3"
        ["webB2"]="S4")
node=${nodes["$1"]}


/usr/lib64/nagios/plugins/check_http -P "<echo>Hello</echo>" -r 'Hello' -k "Cache-Control: no-cache" -k "Pragma: no-cache" -k "Content-Type: text/xml; charset=UTF-8" -k "Accept: text/xml" -S -H example.com -p 443 -u /service -t 3 -k "Cookie: SERVERID=$node" $APPEND_OPTS

Note that SERVERID is the name of the cookie set in haproxy.

Once this is in place you can define your nagios check commands similar to:

#Check path through av A fw and haproxy
define command{
        command_name    check_node_external_a
        command_line    $USER1$/check_node '$HOSTNAME$' '<A external IP>'
        }

Where check_node is the name of the wrapper script and 'A external IP' is the IP used to reach the system in data center A.

This would have saved me a lot of time the last few days so I hope it can send you in the right direction too.

Related Solutions

Problem with Icinga web installation – server side errors

I poked about with this and the problem seems to be within the Agavi components, specifically:

lib/agavi/src/filter/AgaviExecutionFilter.class.php

And within the writeCache() method:

@mkdir(AgaviConfig::get('core.cache_dir') . DIRECTORY_SEPARATOR . 
        self::CACHE_SUBDIR .
        DIRECTORY_SEPARATOR . implode(DIRECTORY_SEPARATOR , 
        array_slice($groups, 0, -1)), 0777, true);

I dropped a logger line into this code and saw that mkdir is trying to recursively create a directory such as:

/usr/local/icinga-www/app/cache/content/amF2YXNjcmlwdF9jb250ZW50/QXBwS2l0X1dpZGdldHMvU3F1aXNoTG9hZGVy

I deleted this, restarted apache but the problem persisted.

At this point the server was running PHP 5.2.17.

I upgraded PHP to 5.3.5 and this issue magically resolved itself.

Out of interest I knocked PHP back to 5.2.17 to see if there were any differences between the mkdir() function behavior in that version compared to the one in 5.3.5. A test script showed that all that PHP did was warn that the folder existed in both versions, there was no hard error thrown.

Also, and bizarrely, icinga-web started working with PHP 5.2.17....and I have no idea why. I checked both version's php.ini files thinking that the error_reporting settings may have changed, but they were the same (except that PHP 5.3.5 is set to E_ALL & ~E_NOTICE | E_DEPRECATED whereas 5.2.17 is set to E_ALL & ~E_NOTICE).

Hope this is of use to anyone else encountering the same issue.

Monitor total service uptime using Icinga

I would use the IDOUtils database backend including all relevant historical data (statehistory, downtimes, notifications) and then use the icinga reporting package within jasper reports. If the provided sample reports do not fit your needs, you can create your own reports with ireport and export them to pdf (and other formats) as well as schedule sla reports being sent on an interval basis.

Depending on your installation method, you may either recompile the source and enable IDOUtils, or install it as additional package (including mysql). Then install jasper and icinga reporting - http://docs.icinga.org/latest/en/reporting.html

The reporting itsself can also integrated in Icinga web as cronk widget if required.

Best Answer

Related Solutions

Problem with Icinga web installation – server side errors

Monitor total service uptime using Icinga

Related Topic