Cheap, reliable high availability solution

drbdhigh-availability

We're looking for a way to improve the reliability of one of our servers (Apache/MySQL/Virtualmin setup).
So far, we've had every possible kind of clusterfuck with that server in the past six months (DNS failure, DDOS, Dom0 failure, network outage, DomU failure, …; on a good day 2 at once) and while all were resolved in less than a day, it's still worrisome – there are about 50 customer websites on that host, and they will get on our throats every time the server is down (the server's availability is still over the contractually guaranteed 99%, but well… you remember the 5 occasions the server had downtimes, not 360 days it was up).

Plans so far:

  1. Backup DNS server (shouldn't be much of a problem)
  2. High-Availability setup for the server itself. The problem here is data replication to the secondary host.

The hosts would be in different (Hetzner, btw.) data centers, so we'd have a rather limited bandwidth (100 MBit uplink, and there should be at least some bandwidth left for the actual users…) and data encryption is more or less a fixed requirement.

DRBD itself scales poor over WAN, neither does it provide encryption. DRBD proxy claims to solve the bandwidth problem (but not the encryption problem, as far as I can see), but it's simply too expensive from what I read, $5k/year are too much (I'm pretty sure that's more than what we're earning with that server).

On the other hand, from my personal experience, OpenVPN/SSH tunnels are not reliable enough to guarantee we wouldn't have false alerts triggering unnecessary failovers (never mind the overhead reducing hard disk performance even more).

So… what alternatives are there? Or am I simply overlooking something?

Edit: To clarify, I'd prefer a replication on file-system/block-device level. Application-level replication is possible, but I'd rather have one replication solution running than one for each application.

Best Answer

You seem somewhat stuck with the DRBD replication. I would think this is because it does not suit your needs. It replicates block devices and is quite bandwidth-intensive (although link compression might alleviate that quite a bit). Check out if you would not be happier with replication at a higher level - like MySQL replication mechanisms for the databases and something like lsyncd for the filesystems.

Gluing it together with stuff from the linux-ha project or setting up a semiautomatic or manual failover mechanism in conjunction with some monitoring is surely a bit of work, but should give you what you want in the long run.

Of course, you still would need an encrypted tunnel for the traffic, but I do not understand your reluctance to using OpenVPN - as the tunnel is just there for the sake of a backup/standby system and you would have either a witness (in a HA setup with automatic failover) or a monitoring system (in a setup with monitoring) which is independent from the presence of the tunnel, you would not have any failovers on tunnel outages and just get the alarms to fix the tunnel upon outage (which is of course necessary, otherwise you lose the capability to do a failover to an up-to-date standby system).

Related Topic