How to construct a MongoDB Health Check in HAProxy

haproxyload balancingmongodb

There is a fastcgi example of a binary health check on the HAProxy blog. How would I construct a similar check for MongoDB such that I am doing a more robust health check for MongoDB – one that verifies that the server is actually there and responding rather than just checking that a port is open?

It would be useful if the health check was generic enough to work with the various MongoDB sharding components (config server, mongos, mongod).

Best Answer

First off, it is worth noting that you will have to be running HAProxy 1.5 or later in order to use the tcp-check feature (as of writing this answer 1.5.3 is the current stable release). Unfortunately Ubuntu 14.04 (for example) ships with version 1.4 so you will need to install from another source. Personally I used the packages from here so that I could keep everything installed via apt.

The example listed on the blog is a good starting point. Using it as a template, all we need to do is to pick an appropriate command to run, then break down that command into hex and construct the appropriate check for MongoDB. The MongoDB wire protocol is documented and published, so in theory you could build it up based on the spec, but there are easier ways to deconstruct such a command. There are built in dissectors in Wireshark that allow you to inspect MongoDB traffic and it provides a handy view of the hex with highlighting to aid us in our efforts here.

The command we will use here is the ping command. As you might expect, it is intended to be lightweight and to return even from a server under heavy load which makes it well suited for a health check command. Any such command can be written using the same methodology if you wish to use something else, but always be wary of using a command that requires a lock of any sort, or could add load to your database.

To illustrate how to get from the command you run to the hex, here is a small shot of the command I have constructed highlighted in Wireshark, having been decoded:

ping command in wireshark

Based on that information, let's create our TCP health check. I will comment on the various pieces to explain where they come from, and each should be easy enough to find in the grab above:

option tcp-check
 # MongoDB Wire Protocol
 tcp-check send-binary 39000000 # Message Length (57)
 tcp-check send-binary EEEEEEEE # Request ID (random value)
 tcp-check send-binary 00000000 # Response To (nothing)
 tcp-check send-binary d4070000 # OpCode (Query)
 tcp-check send-binary 00000000 # Query Flags
 tcp-check send-binary 746573742e # fullCollectionName (test.$cmd)
 tcp-check send-binary 24636d6400 # continued
 tcp-check send-binary 00000000 # NumToSkip
 tcp-check send-binary FFFFFFFF # NumToReturn
 # Start of Document 
 tcp-check send-binary 13000000 # Document Length (19)
 tcp-check send-binary 01 # Type (Double)
 tcp-check send-binary 70696e6700 # Ping:
 tcp-check send-binary 000000000000f03f # Value : 1
 tcp-check send-binary 00 # Term

 tcp-check expect string ok

It would be nice to use a full binary match on the response too, but unfortunately there is no way to predict the request ID generated by the server for each response, hence such a full match will fail (there is no way to selectively ignore pieces of a binary match).

EDIT: Sep 8th 2014 Thanks to comments from this Q&A from Baptiste and Felix I went back to re-test the partial binary match which seemed to fail initially - looks like that was just a case of me transcribing the binary incorrectly for the response, so I have amended the answer to reflect that.

The "ok" string is just an OK check - any such response will mean that the server in question is still responding, but the limited check is somewhat unsatisfying. While a full response check is not possible, everything after the request ID is usable.

Hence, here is the working binary check for the usable part of the response broken down, again using Wireshark to tease out the pieces as above:

# Check for response (starting after request ID)
tcp-check expect binary EEEEEEEE # Response To (from the check above)
tcp-check expect binary 01000000 # OpCode (Reply)
tcp-check expect binary 00000000 # Reply Flags (none)
tcp-check expect binary 0000000000000000# Cursor ID (0)
tcp-check expect binary 00000000 # Starting From (0)
tcp-check expect binary 11000000 # Document Length (17)
tcp-check expect binary 01 # Type (Double) 
tcp-check expect binary 6f6b # ok
tcp-check expect binary 00000000000000f03f # value: 1
tcp-check expect binary 00 # term

All of the above was tested successfully with MongoDB 2.6.4 and HAProxy 1.5.3