I have been reviewing XtraDB clustering and produced a P.o.C. environment on Openstack using 4 instances, which has fallen over during my resilience testing.
Per the pxc documentation: http://www.percona.com/doc/percona-xtradb-cluster/howtos/virt_sandbox.html which covers a 3 node install I opted for a 4th.
- Initial setup complete data loading tests passed, with all nodes being updates synchronously using a 1.6GB test sql file to load a database.
- Failiure and restore of nodes commenced, this test entailed stoping the mysql service on a node, creating and subsequently dropping a database to test surviving node replication, and starting of the downed node to resync.
- This worked fine for nodes 4,3,2.
- Node1 which per the pxc documents is essentially a controller, would not rejoin the cluster.
So my questions are as follows:
- How to return a controller node to service if surviving nodes have since had data written to them
- Using 4 nodes as a reference, is there a way to remove this single point of failure in node1? (if a surviving node restarts with the controller (node1) down/out of sync, that node will also fail).
Best Answer
Based on your symptoms on node one, you are using
in your configuration file, which means the node will start a new cluster. You can confirm this with wsrep_cluster_size variable being 1 on node1, and 3 on the others. If you want to join node1 to your already existing cluster, you should specify
In this case, node1 will rejoin the cluster.
Some additional thoughts:
Because of the quorum mechanism in PXC (Percona Xtradb Cluster), it's not recommended to run it on 4 nodes. It's recommended to use an odd number of nodes, so in case of a network split, one part of the split cluster will be able have majority.
Instead of wsrep_cluster_address you can use wsrep_urls in the [mysqld_safe] section.
Disclaimer: I work for Percona.