I work for a small company who provides a web application for thousands of users. Earlier this year they had one server hosted one company. We recently acquired another server in a different location with the hopes of one day making this a redundant failover machine. I understand what to do with the mysql replication, I plan on using a master-master replication setup, and rsync to sync the scripts and files, however I am at a stand still about how to configure the fail-over. Ideally I would like the two machines to accept requests, like a round robin dns, however if one machine goes down I do not want requests to go that machine. All of the solutions I am come across assumes high availability of servers in the same location, these servers are in two completely different locations with different public ip address. Any help would be great. Thanks
Highly Available Web Application (LAMP)
apache-2.2high-availabilitylamp
Related Solutions
Disclaimer: You'd be mad to listen to me without doing a tonne of testing AND getting a 2nd opinion from someone qualified - I'm new to this game.
The efficiency improvement idea proposed in this question won't work. The main mistake that I made was to think that the order that the memcached stores are defined in the pool dictates some kind of priority. This is not the case. When you define a pool of memached daemons (e.g. using session.save_path="tcp://192.168.0.1:11211, tcp://192.168.0.2:11211"
) you can't know which store will be used. Data is distributed evenly, meaning that a item might be stored in the first, or it could be the last (or it could be both if the memcache client is configured to replicate - note it is the client that handles replication, the memcached server does not do it itself). Either way will mean that using localhost as the first in the pool won't improve performance - there is a 50% chance of hitting either store.
Having done a little bit of testing and research I have concluded that you CAN share sessions across servers using memcache BUT you probably don't want to - it doesn't seem to be popular because it doesn't scale as well as using a shared database at it is not as robust. I'd appreciate feedback on this so I can learn more...
Ignore the following unless you have a PHP app:
Tip 1: If you want to share sessions across 2 servers using memcache:
Ensure you answered Yes to "Enable memcache session handler support?" when you installed the PHP memcache client and add the following in your /etc/php.d/memcache.ini
file:
session.save_handler = memcache
On webserver 1 (IP: 192.168.0.1):
session.save_path="tcp://192.168.0.1:11211"
On webserver 2 (IP: 192.168.0.2):
session.save_path="tcp://192.168.0.1:11211"
Tip 2: If you want to share sessions across 2 servers using memcache AND have failover support:
Add the following to your /etc/php.d/memcache.ini
file:
memcache.hash_strategy = consistent
memcache.allow_failover = 1
On webserver 1 (IP: 192.168.0.1):
session.save_path="tcp://192.168.0.1:11211, tcp://192.168.0.2:11211"
On webserver 2 (IP: 192.168.0.2):
session.save_path="tcp://192.168.0.1:11211, tcp://192.168.0.2:11211"
Notes:
- This highlights another mistake I made in the original question - I wasn't using an identical
session.save_path
on all servers. - In this case "failover" means that should one memcache daemon fail, the PHP memcache client will start using the other one. i.e. anyone who had their session in the store that failed will be logged out. It is not transparent failover.
Tip 3: If you want to share sessions using memcache AND have transparent failover support:
Same as tip 2 except you need to add the following to your /etc/php.d/memcache.ini
file:
memcache.session_redundancy=2
Notes:
- This makes the PHP memcache client write the sessions to 2 servers. You get redundancy (like RAID-1) so that writes are sent to n mirrors, and failed
get's
are retried on the mirrors. This will mean that users do not loose their session in the case of one memcache daemon failure. - Mirrored writes are done in parallel (using non-blocking-IO) so speed performance shouldn't go down much as the number of mirrors increases. However, network traffic will increase if your memcache mirrors are distributed on different machines. For example, there is no longer a 50% chance of using localhost and avoiding network access.
- Apparently, the delay in write replication can cause old data to be retrieved instead of a cache miss. The question is whether this matters to your application? How often do you write session data?
memcache.session_redundancy
is for session redundancy but there is also amemcache.redundancy
ini option that can be used by your PHP application code if you want it to have a different level of redundancy.- You need a recent version (still in beta at this time) of the PHP memcache client - Version 3.0.3 from pecl worked for me.
Thanks to the Pacemaker mailing list, we have a solution. The problem is that the LSB script for 389 doesn't understand the concept of master/slave. The easiest solution is to use a simple clone, rather than a master/slave clone. New Pacemaker configuration looks like the following:
property stonith-enabled=false
property no-quorum-policy=ignore
rsc_defaults resource-stickiness=100
primitive elastic_ip lsb:elastic-ip op monitor interval="10s"
primitive dirsrv lsb:dirsrv op monitor interval="15s" role="Slave" timeout="10s" op monitor interval="16s" role="Master" timeout="10s"
clone ldap-clone dirsrv
order ldap-after-eip inf: elastic_ip ldap-clone
colocation ldap-with-eip inf: elastic_ip ldap-clone
Best Answer
Typically, heartbeat (pacemaker) or MMM is used to manage an IP resource that would fail over dynamically. For that to work effectively, you need to share the same network segment.
If the servers are not in the same physical space, even having two disparate Internet links for monitoring is more fallible than one of the links being a several foot serial cable.
You will need to measure the risk and prioritize based on your needs. What's your top priority? Availability or data integrity? If data integrity is not a priority, you could potentially failover automatically, but you still risk partitioning. The CAP theorem explores this in greater detail.
It's generally not advised to write to both master servers at the same time, as there can be id conflicts. You can configure an offset but this is something that needs to be considered with your entire architecture in context.
Based on what I know from what you described, I'd probably lean towards data integrity. I'd setup dual master, only write to one master IP from your application. In case of failure on your primary, I would have the manual failover procedure be to repoint the Web application to the secondary db.
If you insisted on automatically failover, you could write a script that would consider more than two failure points and you could minimize the data risk with the additional logic. This architecture is substantially more complicated, however, and you would have to design some of it yourself.
There's a variety of technologies available between MySQL clustering (NDB) and Google's patches but nothing completely eliminates the CAP theorem.