Mysql – thesql (Perconadb) Galera/Xtrabackup cluster join fails with “Invalid Argument”

galeraMySQLpercona-xtradb-clusterxtrabackup

I have a MySQL Galera cluster, using Perconadb and Xtrabackup. The nodes can start stand-alone, or can join the cluster if only an IST is required. However, if an SST is required, then this runs to completion and then fails.

The logs show that, after the xtrabackup SST is completed, it exits with stats 22 (Invalid Argument) causing the SST to be rolled back and the node fails to come up.

2018-08-09 00:43:25 860 [Note] WSREP: 0.0 (xmdadb01): State transfer to 1.0 (xmdadb02) complete.
2018-08-09 00:43:25 860 [Note] WSREP: Member 0.0 (xmdadb01) synced with group.
2018-08-09 00:43:25 860 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.93.40.122' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '860'  '' : 22 (Invalid argument)
2018-08-09 00:43:25 860 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2018-08-09 00:43:25 860 [ERROR] WSREP: SST script aborted with error 22 (Invalid argument)
2018-08-09 00:43:25 860 [ERROR] WSREP: SST failed: 22 (Invalid argument)
2018-08-09 00:43:25 860 [ERROR] Aborting

The relevant parts of the my.cnf:

[mysqld]
wsrep_provider=/usr/lib64/galera3/libgalera_smm.so
wsrep_provider_options="gcache.size=256M;gcs.fc_factor=1.0;gcs.fc_limit=512;gcs.fc_master_slave=YES;pc.checksum=true;"
wsrep_cluster_name="galera01-xmd"
wsrep_cluster_address="gcomm://10.93.40.121:4567,10.93.40.122:4567"
wsrep_node_name=xmdadb02
wsrep_node_address="10.93.40.122"
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=sst_user:password-goes-in-here

As the SST runs, I can see the files coming over into /var/lib/mysql/.sst, so I know this is working. I have verified the user and password are correct. However, why is the xtrabackup-v2 returning 22, and how can I stop it from doing so in order for the SST to complete?

Annoyingly, when this setup was first installed, SST worked without issue. I do not know what changed in the intervening time to prevent SST while still allowing IST to work.

Best Answer

Because galera has a creative outlook on what constitutes a meaningful error message, don't expect EINVAL 22 to correspond to a syscall return code.

Take a look at some of code around this EINVAL text in their code.

fixing isn't a priority.