Ldap – High availability LDAPS with Fedora Directory Server

fedorahigh-availabilityldap

I've been asked to set up a HA LDAP architecture using Fedora Directory Server – the company currebnly uses Sun DS but wants to move away from Sun.

I want to use a network hardware loadbalancer (Cisco) so that clients can just use 'ldap.business.com' as the LDAP server name, with the real IPs of the 4 servers behind it hidden.

For plain LDAP this works well, but now I want to add LDAPS using TLS. The certificate setup process seems well documented on the Fedora site but I'm unsure how to setup LDAPS to be highly available as star certificate is not allowed I think.

Round-robin DNS won't be robust enough as it is reliant on TTL – it is still possible to have an LDAP outage.

Best Answer

We handle HA LDAPS on Novell eDirectory, but the problem is similar. We've managed to solve the problem with subject-alternate-names on the certificates. Subject-alternate-names are, simply, alternate Subjects you can put on a certificate in order to give it more than one name. That's how you can (in theory) have a single certificate that is valid for pop3.organisation.org, imap.organisation.org, and webmail.organisation.org. They're fairly new, but not as new as Extended Validation certificates.

Most modern LDAP clients are smart enough to treat SAN's correctly. Also, we manage the certificate authority that minted the certificates so getting a SAN is simple for us. It isn't so simple if you're going to end up paying for a certificate, the CA's would rather you purchase multiple certificates. Unfortunately for a lot of people, some software packages can only load one certificate. This is where SAN's come in.

We use a hardware load-balancer (F5 BiP) and three LDAPS servers. When we first set it up we created certificates with just the network name of the load-balancer IP/DNS. Clients connecting directly to the LDAP servers got certificate errors, which proved to be an incentive to get people to use the load-balancer IP address like they should have been doing all along.

We've since moved to using the subject-alternate-names as it was having some negative side effects with the Novell software running on those servers. But we did get it running without SAN's for a while. Each certificate has three names on it:

IP Address of the load-balancer IP
DNS name of the load-balancer IP
DNS name of the direct host

This does expose the back-end hostname to those who snoop, but we don't consider this a vulnerability. Others might.

That's what we do, and it works for us.

Related Solutions

Replicating beanstalkd for High Availability

Since it is writing to disk via binlog, I'd think you could do something similar to what MySQL admins typically do: heartbeat w/ DRBD (example here).

The last time I tried to use heartbeat though, it didn't support non-multicast checking between nodes, meaning that it was more or less impossible to run on cloud/VPS infrastructure (AWS, Linode, Slicehost, etc). In fact, most clustering services use multicast. This may no longer be the case, but it's something to be aware of. You may be able to use keepalived to provide ip-based failover, which also only supports multicast BUT has a patch available via Willy Tarreau (author of HAProxy) to add unicast support. I have personally tested this on a pair of Linode VPS servers and keepalived is able to failover a shared IP address in the event of the master server failing.

One thing you can do which is probably less optimal is to write jobs to a number of beanstalkd servers (aka partitioning). If one of them goes down, have your app detect this and write to the other instance(s) instead. Your workers will have to intelligently poll each of the beanstalkd instances and be able to ignore dead instances. Since you are binlogging, bringing an instance back up should be as easy as restarting it and the app/workers will detect this and continue as usual (and begin processing the jobs in the newly-started instance). I'm obviously simplifying the process, but that's one other way to handle it.

Options for Multisite High Availability with Puppet

Puppet actually lends itself pretty well to multi-master environments, with caveats. The main one? Lots of parts of Puppet like to be centralized. The certificate authority, the inventory and dashboard/report services, filebucketing and stored configs - all of them are at their best in (or simply require) a setup where there's just one place for them to talk to.

It's quite workable, though, to get a lot of those moving parts working in a multi-master environment, if you're ok with the graceful loss of some of the functionality when you've lost your primary site.

Let's start with the base functionality to get a node reporting to a master:

Modules and Manifests

This part's simple. Version control them. If it's a distributed version control system, then just centralize and sync, and alter your push/pull flow as needed in the failover site. If it's Subversion, then you'll probably want to svnsync the repo to your failover site.

Certificate Authority

One option here is to simply sync the certificate authority files between the masters, so that all share the same root cert and are capable of signing certificates. This has always struck me as "doing it wrong";

Should one master really see its own cert presented in client auth for an incoming connection from another master as valid?
Will that reliably work for the inventory service, dashboard, etc?
How do you add additional valid DNS alt names down the road?

I can't honestly say that I've done thorough testing of this option, since it seems horrible. However, it seems that Puppet Labs are not looking to encourage this option, per the note here.

So, what that leaves is to have a central CA master. All trust relationships remain working when the CA is down since all clients and other masters cache the CA certificate and the CRL (though they don't refresh the CRL as often as they should), but you'll be unable to sign new certificates until you get the primary site back up or restore the CA master from backups at the failover site.

You'll pick one master to act as CA, and have all other masters disable it:

[main]
    ca_server = puppet-ca.example.com
[master]
    ca = false

Then, you'll want that central system to get all of the certificate related traffic. There are a few options for this;

Use the new SRV record support in 3.0 to point all agent nodes to the right place for the CA - _x-puppet-ca._tcp.example.com
Set up the ca_server config option in the puppet.conf of all agents

Proxy all traffic for CA-related requests from agents on to the correct master. For instance, if you're running all your masters in Apache via Passenger, then configure this on the non-CAs:

SSLProxyEngine On
# Proxy on to the CA.
ProxyPassMatch ^/([^/]+/certificate.*)$ https://puppet-ca.example.com:8140/$1
# Caveat: /certificate_revocation_list requires authentication by default,
# which will be lost when proxying. You'll want to alter your CA's auth.conf
# to allow those requests from any device; the CRL isn't sensitive.

And, that should do it.

Before we move on to the ancillary services, a side note;

DNS Names for Master Certificates

I think this right here is the most compelling reason to move to 3.0. Say you want to point a node at "any ol' working master".

Under 2.7, you'd need a generic DNS name like puppet.example.com, and all of the masters need this in their certificate. That means setting dns_alt_names in their config, re-issuing the cert that they had before they were configured as a master, re-issuing the cert again when you need to add a new DNS name to the list (like if you wanted multiple DNS names to have agents prefer masters in their site).. ugly.

With 3.0, you can use SRV records. Give all your clients this;

[main]
    use_srv_records = true
    srv_domain = example.com

Then, no special certs needed for the masters - just add a new record to your SRV RR at _x-puppet._tcp.example.com and you're set, it's a live master in the group. Better yet, you can easily make the master selection logic more sophisticated; "any ol' working master, but prefer the one in your site" by setting up different sets of SRV records for different sites; no dns_alt_names needed.

Reports / Dashboard

This one works out best centralized, but if you can live without it when your primary site's down, then no problem. Just configure all of your masters with the correct place to put the reports..

[master]
    reports = http
    reporturl = https://puppetdash.example.com/reports/upload

..and you're all set. Failure to upload a report is non-fatal for the configuration run; it'll just be lost if the dashboard server's toast.

Fact Inventory

Another nice thing to have glued into your dashboard is the inventory service. With the facts_terminus set to rest as recommended in the documentation, this'll actually break configuration runs when the central inventory service is down. The trick here is to use the inventory_service terminus on the non-central masters, which allows for graceful failure..

facts_terminus = inventory_service
inventory_server = puppet-ca.example.com
inventory_port = 8140

Have your central inventory server set to store the inventory data through either ActiveRecord or PuppetDB, and it should keep up to date whenever the service is available.

So - if you're ok with being down to a pretty barebones config management environment where you can't even use the CA to sign a new node's cert until it's restored, then this can work just fine - though it'd be really nice if some of these components were a bit more friendly to being distributed.