GlusterFS not running on correct port! (peer disconnected / brick not starting)

centos7glusterfs

On CentOS 7 witch two bricks on srv1 and srv2

I've upgraded gluster from 313 to 6 by using yum. I then rebooted server 1, started and mounted the drive successfully.

This is my mount command:
/usr/sbin/mount.glusterfs 127.0.0.1:/RepVol /home -o direct-io-mode=enable

I then restarted srv2, I cannot mount:

[2019-08-29 14:16:01.354362] I [MSGID: 101190] [event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0
[2019-08-29 14:16:01.354402] I [glusterfsd-mgmt.c:2443:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: srv2
[2019-08-29 14:16:01.354409] I [glusterfsd-mgmt.c:2463:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2019-08-29 14:16:01.354600] W [glusterfsd.c:1570:cleanup_and_exit] (-->/lib64/libgfrpc.so.0(+0xf1d3) [0x7f477284f1d3] -->/usr/sbin/glusterfsd(+0x12fef) [0x564e35a67fef] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x564e35a6001b] ) 0-: received signum (1), shutting down
[2019-08-29 14:16:01.357036] I [socket.c:3754:socket_submit_outgoing_msg] 0-glusterfs: not connected (priv->connected = 0)
[2019-08-29 14:16:01.357050] W [rpc-clnt.c:1704:rpc_clnt_submit] 0-glusterfs: failed to submit rpc-request (unique: 0, XID: 0x2 Program: Gluster Portmap, ProgVers: 1, Proc: 5) to rpc-transport (glusterfs)

The error message is Exhausted all volfile servers. At least that's the only thing showing as an error imo.

on srv1:

Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick srv1:/datafold                        49152     0          Y       16291
Self-heal Daemon on localhost               N/A       N/A        Y       16313

Task Status of Volume RepVol
------------------------------------------------------------------------------
There are no active volume tasks

on srv2:

Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick srv1:/datafold                        49152     0          Y       16291
Brick srv2:/datafold                        N/A       N/A        N       N/A
Self-heal Daemon on localhost               N/A       N/A        N       N/A
Self-heal Daemon on srv1                    N/A       N/A        Y       16313

Task Status of Volume RepVol
------------------------------------------------------------------------------
There are no active volume tasks

So it makes sense it cannot mount when the brick is offline. However, I have no clue how to start this brick, even after searching for hours. It would be nice to find a solution.

I tried removing the volume to recreate it but it complains not all bricks are connected.

I also read that gluster uses ipv6 on default since version 5, but not sure how it affect my setup since srv1 seems to be up and running?

EDIT:

Glusterd is not running on the right port! It should be 24007 but it is shown as:
netstat -tulpn | grep gluster
tcp 0 0 0.0.0.0:34678 0.0.0.0:* LISTEN 28743/glusterd

what the hell? How do I fix this?? Restarting does nothing than that it assigns a new random port…
tcp 0 0 0.0.0.0:43914 0.0.0.0:* LISTEN 17134/glusterd

Why is it not running on 24007?

Best Answer

I removed glusterfs-server yum remove glusterfs-server -y and installed it again:

yum install glusterfs-server -y
systemctl enable glusterd.service
systemctl start glusterd.service

It then started at port 24007 and everything worked again.

I just wasted a couple of hours because glusterd decided a random port would be fine while 24007 wasn't even in use, great!

Related Topic