Percona XtraDB Cluster node recovery

perconapercona-xtradb-clusterxtradb

I have been reviewing XtraDB clustering and produced a P.o.C. environment on Openstack using 4 instances, which has fallen over during my resilience testing.

Per the pxc documentation: http://www.percona.com/doc/percona-xtradb-cluster/howtos/virt_sandbox.html which covers a 3 node install I opted for a 4th.

Initial setup complete data loading tests passed, with all nodes being updates synchronously using a 1.6GB test sql file to load a database.
Failiure and restore of nodes commenced, this test entailed stoping the mysql service on a node, creating and subsequently dropping a database to test surviving node replication, and starting of the downed node to resync.
1. This worked fine for nodes 4,3,2.
2. Node1 which per the pxc documents is essentially a controller, would not rejoin the cluster.

So my questions are as follows:

How to return a controller node to service if surviving nodes have since had data written to them
Using 4 nodes as a reference, is there a way to remove this single point of failure in node1? (if a surviving node restarts with the controller (node1) down/out of sync, that node will also fail).

Best Answer

Based on your symptoms on node one, you are using

wsrep_cluster_address=gcomm://

in your configuration file, which means the node will start a new cluster. You can confirm this with wsrep_cluster_size variable being 1 on node1, and 3 on the others. If you want to join node1 to your already existing cluster, you should specify

wsrep_cluster_address=gcomm://(ip of a running node here)

In this case, node1 will rejoin the cluster.

Some additional thoughts:

Because of the quorum mechanism in PXC (Percona Xtradb Cluster), it's not recommended to run it on 4 nodes. It's recommended to use an odd number of nodes, so in case of a network split, one part of the split cluster will be able have majority.
Instead of wsrep_cluster_address you can use wsrep_urls in the [mysqld_safe] section.

Disclaimer: I work for Percona.

Related Solutions

Percona XtraDB Cluster – Node Won’t Join

120915 16:47:32 [Warning] WSREP: error executing 'SET GLOBAL innodb_disallow_writes=1': 1193 (Unknown system variable 'innodb_disallow_writes'). Was mysqld built with --with-innodb-disallow-writes ?

120915 16:47:32 [ERROR] WSREP: Failed to disallow InnoDB writes

Although the mysqlbug is obsolete in 5.5, it still can be used to get the compile options. Make sure that you see -DWITH_INNODB_DISALLOW_WRITES in the Compilation info (used): line:

Compilation info (used): CC='/usr/bin/gcc44' CFLAGS='-fPIC -Wall -O3 -g -static-libgcc -fno-omit-frame-pointer -DPERCONA_INNODB_VERSION=rel28.1 -DWITH_WSREP -DWSREP_PROC_INFO -DMYSQL_MAX_VARIABLE_VALUE_LEN=2048 -DWITH_INNODB_DISALLOW_WRITES...

rsync: open "mysql/dummy.bak" (in rsync_sst) failed: Permission denied (13)
rsync: open "test/dummy.bak" (in rsync_sst) failed: Permission denied (13)

What are permissions of these files?

Update: I changed SST type to mysqldump and it started working.

Quite strange. Don't you get Unknown system variable 'innodb_disallow_writes'?

Also, this is located in .err

19:41:41 UTC - mysqld got signal 6 ;
This could be because you hit a bug...

Could you please post some lines above this? Have you tried to use xtrabackup method?

Trouble setting up a Percona XtraDB cluster on 2 servers

are you sure you have percona-xtrabackup and backup/replication scripts correctly installed? Reading your logs, it seems that the node can't locate wsrep_sst_xtrabackup, so it fails.

Also, it happened to me with wsrep_sst_rsync (https://bugs.launchpad.net/percona-xtradb-cluster/+bug/917265), maybe this is a similar issue.

Greetings.

Best Answer

Related Solutions

Percona XtraDB Cluster – Node Won’t Join

Trouble setting up a Percona XtraDB cluster on 2 servers

Related Topic