My Environtment :
-
Two Nodes – CentOS 6.4 x86_64
Node1 : 10.10.201.3
Node2 : 10.10.201.4
-
MariaDB-server-10.2.0-1.el6.x86_64
Both nodes are running normally, but after restarting mysql on Node1, it'll be failed to start again whilst mysql on Node2 is continuing running without problem.
Configuration on Node1 :
#/etc/my.cnf.d/server.cnf
node1
bind-address=10.10.201.3
datadir=/opt/mysql
socket=/opt/mysql/mysql.sock
handlersocket_address="10.10.201.3"
handlersocket_port="9998"
handlersocket_port_wr="9999"
open_files_limit = 25600
log-error=/opt/mysql/log/mysqld.log
[galera]
# Mandatory settings
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://10.10.201.4,10.10.201.3"
wsrep_cluster_name='galera_cluster_www'
wsrep_node_address='10.10.201.3'
wsrep_node_name='www-node1'
wsrep_sst_method=rsync
wsrep_sst_auth=sst_user:password
wsrep-slave-threads=16
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
wsrep_log_conflicts=ON
wsrep_provider_options="cert.log_conflicts=ON"
wsrep_debug=ON
wsrep_max_ws_size = 2G
binlog_row_image = minimal
Configuration on Node2 :
#/etc/my.cnf.d/server.cnf
node2
bind-address=10.10.201.4
datadir=/opt/mysql
socket=/opt/mysql/mysql.sock
handlersocket_address="10.10.201.4"
handlersocket_port="9998"
handlersocket_port_wr="9999"
open_files_limit = 25600
[galera]
# Mandatory settings
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://10.10.201.3,10.10.201.4"
wsrep_cluster_name='galera_cluster_www'
wsrep_node_address='10.10.201.4'
wsrep_node_name='www-node2'
wsrep_sst_method=rsync
wsrep_sst_auth=sst_user:password
wsrep-slave-threads=16
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
wsrep_log_conflicts=ON
wsrep_provider_options="cert.log_conflicts=ON"
wsrep_debug=ON
wsrep_max_ws_size = 2G
binlog_row_image = minimal
And finally, Cluster Information on Node2 after failing mysql on the first node (Node1) :
MariaDB [(none)]> show status like 'wsrep%';
+------------------------------+--------------------------------------+
| Variable_name | Value |
+------------------------------+--------------------------------------+
| wsrep_apply_oooe | 0.017353 |
| wsrep_apply_oool | 0.000050 |
| wsrep_apply_window | 1.021550 |
| wsrep_causal_reads | 0 |
| wsrep_cert_deps_distance | 24.564685 |
| wsrep_cert_index_size | 48 |
| wsrep_cert_interval | 0.021750 |
| wsrep_cluster_conf_id | 69 |
| wsrep_cluster_size | 1 |
| wsrep_cluster_state_uuid | c07f825f-132f-11e6-b219-d7e841605104 |
| wsrep_cluster_status | Primary |
| wsrep_commit_oooe | 0.000000 |
| wsrep_commit_oool | 0.000034 |
| wsrep_commit_window | 1.005403 |
| wsrep_connected | ON |
| wsrep_evs_delayed | |
| wsrep_evs_evict_list | |
| wsrep_evs_repl_latency | 0/0/0/0/0 |
| wsrep_evs_state | OPERATIONAL |
| wsrep_flow_control_paused | 0.000000 |
| wsrep_flow_control_paused_ns | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_flow_control_sent | 0 |
| wsrep_gcomm_uuid | 401f6755-71da-11e6-8244-9e88079ed6c4 |
| wsrep_incoming_addresses | 10.10.201.4:3306 |
| wsrep_last_committed | 2364263 |
| wsrep_local_bf_aborts | 116 |
| wsrep_local_cached_downto | 2221069 |
| wsrep_local_cert_failures | 23 |
| wsrep_local_commits | 729390 |
| wsrep_local_index | 0 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_avg | 0.004725 |
| wsrep_local_recv_queue_max | 6 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_replays | 112 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_avg | 0.000335 |
| wsrep_local_send_queue_max | 2 |
| wsrep_local_send_queue_min | 0 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_local_state_uuid | c07f825f-132f-11e6-b219-d7e841605104 |
| wsrep_protocol_version | 7 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info@codership.com> |
| wsrep_provider_version | 25.3.15(r3578) |
| wsrep_ready | ON |
| wsrep_received | 1376816 |
| wsrep_received_bytes | 630752657 |
| wsrep_repl_data_bytes | 303429595 |
| wsrep_repl_keys | 3039261 |
| wsrep_repl_keys_bytes | 41097380 |
| wsrep_repl_other_bytes | 0 |
| wsrep_replicated | 729452 |
| wsrep_replicated_bytes | 391211903 |
| wsrep_thread_count | 17 |
+------------------------------+--------------------------------------+
Best Answer
I had the same problem, and finally after fixing the issue ( on CentOS 7 - MariaDB-server-10.2.0-1 ), I wrote a documentation on how to set it up correctly and I hope it will fix yours too. Follow the instructions below and try to build your Galera nodes from scratch. Notice that I will just use the mandatory configuration, you can add yours to it later. I guess you have missed the fifth step or you haven't done it correctly. Anyway, I will write all the steps so anyone else can find it easier to follow.
The problem arises when you do not use the
galera_new_cluster
command on the master node, and you do not use the appropriate address forwsrep_cluster_address
- gcomm. So when the master fails, other nodes cannot come back to the peer. (not even the master can come back in cluster)Consider 3 servers named
GLR{1,2,3}
and we are going to setup Galera Cluster on each. - I will explain how to avoid failure on two-node cluster in the seventh step.STEP 1
For installation:
Open
/etc/yum.repos.d/mariadb.repo
with your favourite editor and add the following lines into it:STEP 2
If you do not know how to manage/configure SELinux, set it to permissive mode and check your log files after finishing the installation to do the required steps for managing it. You might also be in need of having
setroubleshoot-server
andsetools-console
packages installed to better understand your SELinux logs.But if you have SELinux enabled and do not want to set it to permissive mode, you should note that it may block mysqld from carrying out required operations. So you should configure it to allow mysqld to run external programs and open listen sockets on unprivileged ports—that is, things that an unprivileged user can do.
Teaching how to manage SELinux is beyond the scope of this answer, but you can set it in permissive mode only for
mysql
requests by doing the following command:STEP 3
After installation using
yum
, Add the following lines to the end of /etc/my.cnf.d/server.cnf as shown below on each GLR server respectively:[GLR1] ↴
[GLR2] ↴
[GLR3] ↴
STEP 4
Reboot all servers.
STEP 5
Use the following command on GLR1 only, and then restart mariadb.service on GLR2 and GLR3:
STEP 6
As you noticed in your question, you can test connectivity between servers by using the following command:
Or just check the cluster size:
STEP 7
On the other hand, after finishing all the steps above, you can use this Article provided by galeracluster website on how to a avoid a single-node failure causing the other to stop working if you wanna use a TWO-NODE cluster.