MariaDB Galera cluster – 3rd node can not start/connect

galeramariadb

I am in the process of trying to set up a MariaDB galera cluster. The first 2 nodes went fine, have them started, connected.

On the third node, it won't start/connect.

# service mysql start
Starting mysql (via systemctl):  Job for mariadb.service failed because a timeout was exceeded. See "systemctl status mariadb.service" and "journalctl -xe" for details.
                                                           [FAILED]

journal -xe output:

Jan 19 09:16:07 host3.domain.com systemd[1]: mariadb.service start operation timed out. Terminating.
-- Subject: Unit session-c9591.scope has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- The start-up result is done.

-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit session-c9591.scope has begun starting up.
Jan 19 09:17:01 host3.domain.com CROND[1018]: (root) CMD (/usr/local/rtm/bin/rtm 8 > /dev/null 2> /dev/null)
Jan 19 09:17:38 host3.domain.com systemd[1]: mariadb.service stop-final-sigterm timed out. Skipping SIGKILL. Entering failed mode.
Jan 19 09:17:38 host3.domain.com systemd[1]: Failed to start MariaDB 10.1.30 database server.
-- Subject: Unit mariadb.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit mariadb.service has failed.
-- 
-- The result is failed.
Jan 19 09:17:38 host3.domain.com systemd[1]: Unit mariadb.service entered failed state.
Jan 19 09:17:38 host3.domain.com systemd[1]: mariadb.service failed.
Jan 19 09:17:38 host3.domain.com polkitd[383]: Unregistered Authentication Agent for unix-process:25848:56441890 (system bus name :1.19233, object path /org/freedesktop/PolicyKit1/Authentic
Jan 19 09:17:51 host3.domain.com mysqld[25932]: 2018-01-19  9:17:51 114327532205824 [Note] WSREP: (15573658, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr  timed out, no messa
Jan 19 09:18:01 host3.domain.com systemd[1]: Started Session c9592 of user root.
-- Subject: Unit session-c9592.scope has finished start-up
-- Defined-By: systemd

I am not clear on why it is timing out why connecting to itself? Is there another log file I should generate that would offer more clues? I confirmed in syslog that the error

[Note] WSREP: (15573658, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr  timed out, no messa

the missing addr is internal private IP.

Best Answer

For me the fix was simply to tear down the entire cluster and start it back up.

on each node:

service mysql stop

On the most advanced node

# galera_new_cluster

On each subsequent node

service mysql start

Verify with:

MariaDB [(none)]> show global status like "%wsrep_cluster_size%";

+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
1 row in set (0.00 sec)