Galera new cluster WSREP Unknown error 141

galeramariadbmysql-clustermysql-replication

I am trying to create a new Galera cluster in the cloud using Ansible (DevOps tool). I can get the cluster started by issuing commands from an SSH shell, but not from Ansible. I have simplified the configuration to two nodes, but cannot get the second node to join the cluster. The cause seems to be an Unknown error 141. I cannot find any documentation on this error.

The cause of the failure appears to be:

    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Warning] WSREP: 0.0 (dev-database1): State transfer to 1.0 (dev-database2) failed: -141 (Unknown error 141)

Can anyone explain what causes this error?

Configuration is Ubuntu 14.04, MariaDB 10.0.20, rsync replication using the same SSL certificates on each machine.

This is the log from a failed attempt to join the cluster. There is no logging on the other node for this period.

    Jul 28 23:43:56 dev-database2 ansible-command: Invoked with creates=None executable=None chdir=None args=service mysql start removes=None NO_LOG=None shell=True warn=True
    Jul 28 23:43:56 dev-database2 mysqld_safe: Starting mysqld daemon with databases from /crypt/mysql
    Jul 28 23:43:56 dev-database2 mysqld_safe: WSREP: Running position recovery with --log_error='/crypt/mysql/wsrep_recovery.YBBBx4' --pid-file='/crypt/mysql/dev-database2-recover.pid'
    Jul 28 23:43:56 dev-database2 mysqld: 150729  9:43:56 [Note] /usr/sbin/mysqld (mysqld 10.0.20-MariaDB-1~trusty-wsrep-log) starting as process 16430 ...
    Jul 28 23:43:59 dev-database2 mysqld_safe: WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] /usr/sbin/mysqld (mysqld 10.0.20-MariaDB-1~trusty-wsrep-log) starting as process 16470 ...
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: Setting wsrep_ready to 0
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: Read nil XID from storage engines, skipping position init
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: wsrep_load(): Galera 3.9(rXXXX) by Codership Oy <info@codership.com> loaded successfully.
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: CRC-32C: using hardware acceleration.
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: Passing config to GCS: base_host = 119.9.25.17; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /crypt/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /crypt/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo =
    Jul 28 23:43:59 dev-database2 mysqld:  false; pc.recovery = 1
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: Service thread queue flushed.
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: wsrep_sst_grab()
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: Start replication
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: protonet asio version 0
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: Using CRC-32C for message checksums.
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: initializing ssl context
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: backend: asio
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Warning] WSREP: access file(gvwstate.dat) failed(No such file or directory)
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: restore pc from disk failed
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: GMCast version 0
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: (84b9a759, 'ssl://0.0.0.0:4567') listening at ssl://0.0.0.0:4567
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: (84b9a759, 'ssl://0.0.0.0:4567') multicast: , ttl: 1
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: EVS version 0
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: gcomm: connecting to group 'Dev DB Cluster', peer '119.9.23.122:,119.9.25.17:'
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: SSL handshake successful, remote endpoint ssl://119.9.25.17:55693 local endpoint ssl://119.9.25.17:4567 cipher: AES128-SHA compression:
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: SSL handshake successful, remote endpoint ssl://119.9.25.17:4567 local endpoint ssl://119.9.25.17:55693 cipher: AES128-SHA compression:
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Warning] WSREP: (84b9a759, 'ssl://0.0.0.0:4567') address 'ssl://119.9.25.17:4567' points to own listening address, blacklisting
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: SSL handshake successful, remote endpoint ssl://119.9.23.122:4567 local endpoint ssl://119.9.25.17:38347 cipher: AES128-SHA compression:
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: (84b9a759, 'ssl://0.0.0.0:4567') address 'ssl://119.9.25.17:4567' pointing to uuid 84b9a759 is blacklisted, skipping
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: (84b9a759, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: declaring 7ba9e675 at ssl://119.9.23.122:4567 stable
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: Node 7ba9e675 state prim
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: view(view_id(PRIM,7ba9e675,2) memb {
    Jul 28 23:43:59 dev-database2 mysqld: #0117ba9e675,0
    Jul 28 23:43:59 dev-database2 mysqld: #01184b9a759,0
    Jul 28 23:43:59 dev-database2 mysqld: } joined {
    Jul 28 23:43:59 dev-database2 mysqld: } left {
    Jul 28 23:43:59 dev-database2 mysqld: } partitioned {
    Jul 28 23:43:59 dev-database2 mysqld: })
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: save pc into disk
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: gcomm: connected
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Opened channel 'Dev DB Cluster'
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Waiting for SST to complete.
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: STATE EXCHANGE: sent state msg: 850918ce-3582-11e5-bb92-1e01df56834d
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: STATE EXCHANGE: got state msg: 850918ce-3582-11e5-bb92-1e01df56834d from 0 (dev-database1)
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: STATE EXCHANGE: got state msg: 850918ce-3582-11e5-bb92-1e01df56834d from 1 (dev-database2)
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Quorum results:
    Jul 28 23:44:00 dev-database2 mysqld: #011version    = 3,
    Jul 28 23:44:00 dev-database2 mysqld: #011component  = PRIMARY,
    Jul 28 23:44:00 dev-database2 mysqld: #011conf_id    = 1,
    Jul 28 23:44:00 dev-database2 mysqld: #011members    = 1/2 (joined/total),
    Jul 28 23:44:00 dev-database2 mysqld: #011act_id     = 0,
    Jul 28 23:44:00 dev-database2 mysqld: #011last_appl. = -1,
    Jul 28 23:44:00 dev-database2 mysqld: #011protocols  = 0/7/3 (gcs/repl/appl),
    Jul 28 23:44:00 dev-database2 mysqld: #011group UUID = ed66210d-3581-11e5-b320-9e349b557133
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Flow-control interval: [23, 23]
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 0)
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: State transfer required:
    Jul 28 23:44:00 dev-database2 mysqld: #011Group state: ed66210d-3581-11e5-b320-9e349b557133:0
    Jul 28 23:44:00 dev-database2 mysqld: #011Local state: 00000000-0000-0000-0000-000000000000:-1
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: New cluster view: global state: ed66210d-3581-11e5-b320-9e349b557133:0, view# 2: Primary, number of nodes: 2, my index: 1, protocol version 3
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Warning] WSREP: Gap in state sequence. Need state transfer.
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Setting wsrep_ready to 0
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '119.9.25.17' --auth '' --datadir '/crypt/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '16470' --binlog '/var/log/mysql/mariadb-bin' '
    Jul 28 23:44:00 dev-database2 rsyncd[16521]: rsyncd version 3.1.0 starting, listening on port 4444
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Prepared SST request: rsync|119.9.25.17:4444/rsync_sst
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: REPL Protocols: 7 (3, 2)
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Service thread queue flushed.
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Service thread queue flushed.
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (ed66210d-3581-11e5-b320-9e349b557133): 1 (Operation not permitted)
    Jul 28 23:44:00 dev-database2 mysqld: #011 at galera/src/replicator_str.cpp:prepare_for_IST():456. IST will be unavailable.
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Member 1.0 (dev-database2) requested state transfer from '*any*'. Selected 0.0 (dev-database1)(SYNCED) as donor.
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 0)
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Requesting state transfer: success, donor: 0
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Warning] WSREP: 0.0 (dev-database1): State transfer to 1.0 (dev-database2) failed: -141 (Unknown error 141)
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [ERROR] WSREP: gcs/src/gcs_group.cpp:int gcs_group_handle_join_msg(gcs_group_t*, const gcs_recv_msg_t*)():731: Will never receive state. Need to abort.
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: gcomm: terminating thread
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: gcomm: joining thread
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: gcomm: closing backend
    Jul 28 23:44:01 dev-database2 mysqld: 150729  9:44:01 [Note] WSREP: view(view_id(NON_PRIM,7ba9e675,2) memb {
    Jul 28 23:44:01 dev-database2 mysqld: #01184b9a759,0
    Jul 28 23:44:01 dev-database2 mysqld: } joined {
    Jul 28 23:44:01 dev-database2 mysqld: } left {
    Jul 28 23:44:01 dev-database2 mysqld: } partitioned {
    Jul 28 23:44:01 dev-database2 mysqld: #0117ba9e675,0
    Jul 28 23:44:01 dev-database2 mysqld: })
    Jul 28 23:44:01 dev-database2 mysqld: 150729  9:44:01 [Note] WSREP: view((empty))
    Jul 28 23:44:01 dev-database2 mysqld: 150729  9:44:01 [Note] WSREP: gcomm: closed
    Jul 28 23:44:01 dev-database2 mysqld: 150729  9:44:01 [Note] WSREP: /usr/sbin/mysqld: Terminated.
    Jul 28 23:44:02 dev-database2 mysqld: WSREP_SST: [ERROR] Parent mysqld process (PID:16470) terminated unexpectedly. (20150729 09:44:02.440)
    Jul 28 23:44:02 dev-database2 mysqld: WSREP_SST: [INFO] Joiner cleanup. (20150729 09:44:02.442)
    Jul 28 23:44:02 dev-database2 rsyncd[16521]: sent 0 bytes  received 0 bytes  total size 0
    Jul 28 23:44:02 dev-database2 mysqld: WSREP_SST: [INFO] Joiner cleanup done. (20150729 09:44:02.948)
    Jul 28 23:44:02 dev-database2 mysqld_safe: mysqld from pid file /var/run/mysqld/mysqld.pid ended
    Jul 28 23:44:28 dev-database2 /etc/init.d/mysql[16869]: 0 processes alive and '/usr/bin/mysqladmin --defaults-file=/etc/mysql/debian.cnf ping' resulted in
    Jul 28 23:44:28 dev-database2 /etc/init.d/mysql[16869]: #007/usr/bin/mysqladmin: connect to server at 'localhost' failed
    Jul 28 23:44:28 dev-database2 /etc/init.d/mysql[16869]: error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111 "Connection refused")'
    Jul 28 23:44:28 dev-database2 /etc/init.d/mysql[16869]: Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!
    Jul 28 23:44:28 dev-database2 /etc/init.d/mysql[16869]:

This is the log from a successful joining. All I did after the above was stop the mysql servers then run sudo service mysql start –wsrep-new-cluster on the other node and sudo service mysql start on this node.

    Jul 29 00:08:02 dev-database2 mysqld_safe: Starting mysqld daemon with databases from /crypt/mysql
    Jul 29 00:08:02 dev-database2 mysqld_safe: WSREP: Running position recovery with --log_error='/crypt/mysql/wsrep_recovery.b5BLaP' --pid-file='/crypt/mysql/dev-database2-recover.pid'
    Jul 29 00:08:02 dev-database2 mysqld: 150729 10:08:02 [Note] /usr/sbin/mysqld (mysqld 10.0.20-MariaDB-1~trusty-wsrep-log) starting as process 19162 ...
    Jul 29 00:08:05 dev-database2 mysqld_safe: WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] /usr/sbin/mysqld (mysqld 10.0.20-MariaDB-1~trusty-wsrep-log) starting as process 19203 ...
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: Setting wsrep_ready to 0
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: Read nil XID from storage engines, skipping position init
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: wsrep_load(): Galera 3.9(rXXXX) by Codership Oy <info@codership.com> loaded successfully.
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: CRC-32C: using hardware acceleration.
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: Passing config to GCS: base_host = 119.9.25.17; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /crypt/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /crypt/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo =
    Jul 29 00:08:05 dev-database2 mysqld:  false; pc.recovery = 1
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: Service thread queue flushed.
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: wsrep_sst_grab()
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: Start replication
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: protonet asio version 0
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: Using CRC-32C for message checksums.
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: initializing ssl context
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: backend: asio
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Warning] WSREP: access file(gvwstate.dat) failed(No such file or directory)
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: restore pc from disk failed
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: GMCast version 0
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: (e28b5f7d, 'ssl://0.0.0.0:4567') listening at ssl://0.0.0.0:4567
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: (e28b5f7d, 'ssl://0.0.0.0:4567') multicast: , ttl: 1
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: EVS version 0
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: gcomm: connecting to group 'Dev DB Cluster', peer '119.9.23.122:,119.9.25.17:'
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: SSL handshake successful, remote endpoint ssl://119.9.25.17:55697 local endpoint ssl://119.9.25.17:4567 cipher: AES128-SHA compression:
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: SSL handshake successful, remote endpoint ssl://119.9.25.17:4567 local endpoint ssl://119.9.25.17:55697 cipher: AES128-SHA compression:
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Warning] WSREP: (e28b5f7d, 'ssl://0.0.0.0:4567') address 'ssl://119.9.25.17:4567' points to own listening address, blacklisting
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: SSL handshake successful, remote endpoint ssl://119.9.23.122:4567 local endpoint ssl://119.9.25.17:38351 cipher: AES128-SHA compression:
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: (e28b5f7d, 'ssl://0.0.0.0:4567') address 'ssl://119.9.25.17:4567' pointing to uuid e28b5f7d is blacklisted, skipping
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: (e28b5f7d, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: declaring cab39946 at ssl://119.9.23.122:4567 stable
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: Node cab39946 state prim
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: view(view_id(PRIM,cab39946,2) memb {
    Jul 29 00:08:05 dev-database2 mysqld: #011cab39946,0
    Jul 29 00:08:05 dev-database2 mysqld: #011e28b5f7d,0
    Jul 29 00:08:05 dev-database2 mysqld: } joined {
    Jul 29 00:08:05 dev-database2 mysqld: } left {
    Jul 29 00:08:05 dev-database2 mysqld: } partitioned {
    Jul 29 00:08:05 dev-database2 mysqld: })
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: save pc into disk
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: gcomm: connected
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Opened channel 'Dev DB Cluster'
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Waiting for SST to complete.
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: STATE EXCHANGE: sent state msg: e2de5bf6-3585-11e5-bb24-8ac7dfc1b048
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: STATE EXCHANGE: got state msg: e2de5bf6-3585-11e5-bb24-8ac7dfc1b048 from 0 (dev-database1)
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: STATE EXCHANGE: got state msg: e2de5bf6-3585-11e5-bb24-8ac7dfc1b048 from 1 (dev-database2)
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Quorum results:
    Jul 29 00:08:06 dev-database2 mysqld: #011version    = 3,
    Jul 29 00:08:06 dev-database2 mysqld: #011component  = PRIMARY,
    Jul 29 00:08:06 dev-database2 mysqld: #011conf_id    = 1,
    Jul 29 00:08:06 dev-database2 mysqld: #011members    = 1/2 (joined/total),
    Jul 29 00:08:06 dev-database2 mysqld: #011act_id     = 0,
    Jul 29 00:08:06 dev-database2 mysqld: #011last_appl. = -1,
    Jul 29 00:08:06 dev-database2 mysqld: #011protocols  = 0/7/3 (gcs/repl/appl),
    Jul 29 00:08:06 dev-database2 mysqld: #011group UUID = ed66210d-3581-11e5-b320-9e349b557133
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Flow-control interval: [23, 23]
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 0)
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: State transfer required:
    Jul 29 00:08:06 dev-database2 mysqld: #011Group state: ed66210d-3581-11e5-b320-9e349b557133:0
    Jul 29 00:08:06 dev-database2 mysqld: #011Local state: 00000000-0000-0000-0000-000000000000:-1
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: New cluster view: global state: ed66210d-3581-11e5-b320-9e349b557133:0, view# 2: Primary, number of nodes: 2, my index: 1, protocol version 3
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Warning] WSREP: Gap in state sequence. Need state transfer.
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Setting wsrep_ready to 0
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '119.9.25.17' --auth '' --datadir '/crypt/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '19203' --binlog '/var/log/mysql/mariadb-bin' '
    Jul 29 00:08:06 dev-database2 rsyncd[19253]: rsyncd version 3.1.0 starting, listening on port 4444
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Prepared SST request: rsync|119.9.25.17:4444/rsync_sst
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: REPL Protocols: 7 (3, 2)
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Service thread queue flushed.
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Service thread queue flushed.
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (ed66210d-3581-11e5-b320-9e349b557133): 1 (Operation not permitted)
    Jul 29 00:08:06 dev-database2 mysqld: #011 at galera/src/replicator_str.cpp:prepare_for_IST():456. IST will be unavailable.
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Member 1.0 (dev-database2) requested state transfer from '*any*'. Selected 0.0 (dev-database1)(SYNCED) as donor.
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 0)
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Requesting state transfer: success, donor: 0
    Jul 29 00:08:06 dev-database2 rsyncd[19268]: name lookup failed for 119.9.23.122: Name or service not known
    Jul 29 00:08:06 dev-database2 rsyncd[19268]: connect from UNKNOWN (119.9.23.122)
    Jul 29 00:08:06 dev-database2 rsyncd[19268]: rsync to rsync_sst/ from UNKNOWN (119.9.23.122)
    Jul 29 00:08:06 dev-database2 rsyncd[19268]: receiving file list
    Jul 29 00:08:07 dev-database2 rsyncd[19268]: sent 63 bytes  received 12,596,622 bytes  total size 12,593,152
    Jul 29 00:08:08 dev-database2 rsyncd[19283]: name lookup failed for 119.9.23.122: Name or service not known
    Jul 29 00:08:08 dev-database2 rsyncd[19283]: connect from UNKNOWN (119.9.23.122)
    Jul 29 00:08:08 dev-database2 rsyncd[19283]: rsync to rsync_sst-log_dir/ from UNKNOWN (119.9.23.122)
    Jul 29 00:08:08 dev-database2 rsyncd[19283]: receiving file list
    Jul 29 00:08:08 dev-database2 mysqld: 150729 10:08:08 [Note] WSREP: (e28b5f7d, 'ssl://0.0.0.0:4567') turning message relay requesting off

Best Answer

We had this problem too, and it took me a while to figure out what was going on because I managed to get Galera working at one point, then it failed again when I tried to repeat it all from scratch.

The not-very-descriptive unknown error -141 appears to occur when spawned processes (such as SST backup runs) fail on startup, when they try to write to the output descriptor that should handle their log messages.

For me the clue was that in the cases where I had this -141 error on the second node, I also noticed that I was not getting any log messages at all from Galera on the first node, even though on the second node I could quite clearly see that Galera had already gone through a complete negotiation to join the cluster successfully, which would normally log at least 40 lines of info on the first node.

I think you may find that the reason why this happens when you use an automation tool, but not when you use the shell, is that mysql is configured to write logs to syslog, but when you start or restart mysql with an automation tool the logger subprocess dies on startup rather than launching successfully. Writes to this descriptor then fail. Mysql itself appears to gracefully handle this by ignoring it, whereas the SST scripts don't and therefore crash.

So why would the logger subprocess fail to launch under an automation tool but not at the shell? It seems to be due to short-lived PTY allocation - I believe specifically the problem is that if the PTY is closed immediately after running the command, which was not yet forked or set up, this breaks it.

For me using a ruby script (capistrano v2) I simply had to add :pty => false to my run command so that the PTY was never allocated in the first place. I don't know if there is a way to do this with Ansible though?

(You can also work around by running the service restart script under something like nohup, although that has undesirable side effects such as making nohup.out files. My understanding is that the HUP signal itself is not the problem as I didn't have to do anything to disown the process when I used the :pty => false option.)

Related Topic