There is a fastcgi
example of a binary health check on the HAProxy blog. How would I construct a similar check for MongoDB such that I am doing a more robust health check for MongoDB
– one that verifies that the server is actually there and responding rather than just checking that a port is open?
It would be useful if the health check was generic enough to work with the various MongoDB
sharding components (config server, mongos
, mongod
).
Best Answer
First off, it is worth noting that you will have to be running HAProxy 1.5 or later in order to use the
tcp-check
feature (as of writing this answer 1.5.3 is the current stable release). Unfortunately Ubuntu 14.04 (for example) ships with version 1.4 so you will need to install from another source. Personally I used the packages from here so that I could keep everything installed viaapt
.The example listed on the blog is a good starting point. Using it as a template, all we need to do is to pick an appropriate command to run, then break down that command into hex and construct the appropriate check for MongoDB. The
MongoDB
wire protocol is documented and published, so in theory you could build it up based on the spec, but there are easier ways to deconstruct such a command. There are built in dissectors in Wireshark that allow you to inspect MongoDB traffic and it provides a handy view of the hex with highlighting to aid us in our efforts here.The command we will use here is the ping command. As you might expect, it is intended to be lightweight and to return even from a server under heavy load which makes it well suited for a health check command. Any such command can be written using the same methodology if you wish to use something else, but always be wary of using a command that requires a lock of any sort, or could add load to your database.
To illustrate how to get from the command you run to the hex, here is a small shot of the command I have constructed highlighted in
Wireshark
, having been decoded:Based on that information, let's create our
TCP
health check. I will comment on the various pieces to explain where they come from, and each should be easy enough to find in the grab above:It would be nice to use a full binary match on the response too, but unfortunately there is no way to predict the request ID generated by the server for each response, hence such a full match will fail (there is no way to selectively ignore pieces of a binary match).
EDIT: Sep 8th 2014 Thanks to comments from this Q&A from Baptiste and Felix I went back to re-test the partial binary match which seemed to fail initially - looks like that was just a case of me transcribing the binary incorrectly for the response, so I have amended the answer to reflect that.
The "ok" string is just an OK check - any such response will mean that the server in question is still responding, but the limited check is somewhat unsatisfying. While a full response check is not possible, everything after the request ID is usable.
Hence, here is the working binary check for the usable part of the response broken down, again using Wireshark to tease out the pieces as above:
All of the above was tested successfully with MongoDB 2.6.4 and HAProxy 1.5.3