Linux – GlusterFS v3.10 not mounting on boot Centos 7.3-1611

glusterfslinux

FIXED for me at least. And I have no idea how. I've run through the logs from when it wasn't working to now when it is and I cannot for the life of my see any different. What is different and I don't know if it's a coincidence is when I execute

gluster volume status

On both nodes they both say Task Status of Volume glustervol1 where as before in server2 it was the hostname of the box. I have no idea how that happened. But it did… Don't know if that fixed it or what but it did it on it's own after numerous reboots.

Good luck.

STILL?! There's a lot of writing on this from 2014 ish on ubuntu and 14.04 using init. I'm running centos 7.3-1611 fully patched with kernel 3.10.0-514.10.2.el7, and gluster volumes still don't mount after reboot on servers where the lvm bricks and the client vol mount are on the same server.

I have 3 boxes

  • server1 (server peer1) and client
  • server2: (server peer2) and client
  • server3: client only

They are using lvm backend. And the glustervol should mount to /data/glusterfs. The issue isn't present on server3 where it's only a client. It connects and mounts using the same rules as the other servers. I've dug into the data logs, into selinux into the start up log. I can't find a way around it. I've considered CTBD and tried autofs to no avail.

gluster version


glusterfs 3.10.0
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.

fstab


/dev/vg_gluster/brick1 /data/bricks/brick1 xfs defaults 0 0
gluster1:/glustervol1 /data/glusterfs glusterfs defaults,_netdev 0 0

What's expected


sdb LVM2_member 6QrvQI-v5L9-bds3-BUn0-ySdB-hDmz-nVojpX
└─vg_gluster-brick1 xfs d181747c-8ed3-430c-bd1c-0b7968666dfe /data/bricks/brick1
and
gluster1:/glustervol1 49G 33M 49G 1% /data/glusterfs

This works by running the manual mount -t glusterfs... or by executing mount -a with the rules in my fstab. But it will not work on boot. I've read that it's something to do with the mounts trying to happen before the daemon started. What is the best workaround for this? Is it to edit systemd files? Does anyone know a fix?

This is a snippet of a fresh boot while trying to mount through fstab where it's saying that there is no brick process running.


[2017-04-03 16:35:47.353523] I [MSGID: 100030] [glusterfsd.c:2460:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.10.0 (args: /usr/sbin/glusterfs --volfile-server=gluster1 --volfile-id=/glustervol1 /data/glusterfs)
[2017-04-03 16:35:47.456915] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2017-04-03 16:35:48.711381] I [afr.c:94:fix_quorum_options] 0-glustervol1-replicate-0: reindeer: incoming qtype = none
[2017-04-03 16:35:48.711398] I [afr.c:116:fix_quorum_options] 0-glustervol1-replicate-0: reindeer: quorum_count = 0
[2017-04-03 16:35:48.712437] I [socket.c:4120:socket_init] 0-glustervol1-client-1: SSL support on the I/O path is ENABLED
[2017-04-03 16:35:48.712451] I [socket.c:4140:socket_init] 0-glustervol1-client-1: using private polling thread
[2017-04-03 16:35:48.712892] E [socket.c:4201:socket_init] 0-glustervol1-client-1: failed to open /etc/ssl/dhparam.pem, DH ciphers are disabled
[2017-04-03 16:35:48.713139] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2017-04-03 16:35:48.759228] I [socket.c:4120:socket_init] 0-glustervol1-client-0: SSL support on the I/O path is ENABLED
[2017-04-03 16:35:48.759243] I [socket.c:4140:socket_init] 0-glustervol1-client-0: using private polling thread
[2017-04-03 16:35:48.759308] E [socket.c:4201:socket_init] 0-glustervol1-client-0: failed to open /etc/ssl/dhparam.pem, DH ciphers are disabled
[2017-04-03 16:35:48.759596] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-glustervol1-readdir-ahead: option 'parallel-readdir' is not recognized
[2017-04-03 16:35:48.759680] I [MSGID: 114020] [client.c:2352:notify] 0-glustervol1-client-0: parent translators are ready, attempting connect on transport
[2017-04-03 16:35:48.762408] I [MSGID: 114020] [client.c:2352:notify] 0-glustervol1-client-1: parent translators are ready, attempting connect on transport
[2017-04-03 16:35:48.904234] E [MSGID: 114058] [client-handshake.c:1538:client_query_portmap_cbk] 0-glustervol1-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2017-04-03 16:35:48.904286] I [MSGID: 114018] [client.c:2276:client_rpc_notify] 0-glustervol1-client-0: disconnected from glustervol1-client-0. Client process will keep trying to connect to glusterd until brick's port is available
Final graph:
+------------------------------------------------------------------------------+
1: volume glustervol1-client-0
2: type protocol/client
3: option ping-timeout 42
4: option remote-host gluster1
5: option remote-subvolume /data/bricks/brick1/brick
6: option transport-type socket
7: option transport.address-family inet
8: option username xxx
9: option password xxx
10: option transport.socket.ssl-enabled on
11: option send-gids true
12: end-volume
13:
14: volume glustervol1-client-1
15: type protocol/client
16: option ping-timeout 42
17: option remote-host gluster2
18: option remote-subvolume /data/bricks/brick1/brick
19: option transport-type socket
20: option transport.address-family inet
21: option username xxx
22: option password xxx
23: option transport.socket.ssl-enabled on
24: option send-gids true
25: end-volume
26:
27: volume glustervol1-replicate-0
28: type cluster/replicate
29: option afr-pending-xattr glustervol1-client-0,glustervol1-client-1
30: option use-compound-fops off
31: subvolumes glustervol1-client-0 glustervol1-client-1
32: end-volume
33:
34: volume glustervol1-dht
35: type cluster/distribute
36: option lock-migration off
37: subvolumes glustervol1-replicate-0
38: end-volume
39:
40: volume glustervol1-write-behind
41: type performance/write-behind
42: subvolumes glustervol1-dht
43: end-volume
44:
45: volume glustervol1-read-ahead
46: type performance/read-ahead
47: subvolumes glustervol1-write-behind
48: end-volume
49:
50: volume glustervol1-readdir-ahead
51: type performance/readdir-ahead
52: option parallel-readdir off
53: option rda-request-size 131072
54: option rda-cache-limit 10MB
55: subvolumes glustervol1-read-ahead
56: end-volume
57:
58: volume glustervol1-io-cache
59: type performance/io-cache
60: subvolumes glustervol1-readdir-ahead
61: end-volume
62:
63: volume glustervol1-quick-read
64: type performance/quick-read
65: subvolumes glustervol1-io-cache
66: end-volume
67:
68: volume glustervol1-open-behind
69: type performance/open-behind
70: subvolumes glustervol1-quick-read
71: end-volume
72:
73: volume glustervol1-md-cache
74: type performance/md-cache
75: subvolumes glustervol1-open-behind
76: end-volume
77:
78: volume glustervol1
79: type debug/io-stats
80: option log-level INFO
81: option latency-measurement off
82: option count-fop-hits off
83: subvolumes glustervol1-md-cache
84: end-volume
85:
86: volume meta-autoload
87: type meta
88: subvolumes glustervol1
89: end-volume
90:
+------------------------------------------------------------------------------+
[2017-04-03 16:35:48.949500] I [rpc-clnt.c:1964:rpc_clnt_reconfig] 0-glustervol1-client-1: changing port to 49152 (from 0)
[2017-04-03 16:35:49.105087] I [socket.c:348:ssl_setup_connection] 0-glustervol1-client-1: peer CN = <name>
[2017-04-03 16:35:49.105103] I [socket.c:351:ssl_setup_connection] 0-glustervol1-client-1: SSL verification succeeded (client: <ip>:24007)
[2017-04-03 16:35:49.106999] I [MSGID: 114057] [client-handshake.c:1451:select_server_supported_programs] 0-glustervol1-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2017-04-03 16:35:49.109591] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-glustervol1-client-1: Connected to glustervol1-client-1, attached to remote volume '/data/bricks/brick1/brick'.
[2017-04-03 16:35:49.109609] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-glustervol1-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2017-04-03 16:35:49.109713] I [MSGID: 108005] [afr-common.c:4756:afr_notify] 0-glustervol1-replicate-0: Subvolume 'glustervol1-client-1' came back up; going online.
[2017-04-03 16:35:49.110987] I [fuse-bridge.c:4146:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22
[2017-04-03 16:35:49.111004] I [fuse-bridge.c:4831:fuse_graph_sync] 0-fuse: switched to graph 0
[2017-04-03 16:35:49.112283] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-glustervol1-client-1: Server lk version = 1
[2017-04-03 16:35:52.547781] I [rpc-clnt.c:1964:rpc_clnt_reconfig] 0-glustervol1-client-0: changing port to 49152 (from 0)
[2017-04-03 16:35:52.558003] I [socket.c:348:ssl_setup_connection] 0-glustervol1-client-0: peer CN = <name>
[2017-04-03 16:35:52.558015] I [socket.c:351:ssl_setup_connection] 0-glustervol1-client-0: SSL verification succeeded (client: <ip>:24007)
[2017-04-03 16:35:52.558167] I [MSGID: 114057] [client-handshake.c:1451:select_server_supported_programs] 0-glustervol1-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2017-04-03 16:35:52.558592] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-glustervol1-client-0: Connected to glustervol1-client-0, attached to remote volume '/data/bricks/brick1/brick'.
[2017-04-03 16:35:52.558604] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-glustervol1-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2017-04-03 16:35:52.558781] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-glustervol1-client-0: Server lk version = 1

Best Answer

1st post the recommended solution

Perhaps you could try

ip:/volume /dir glusterfs defaults,noauto,x-systemd.automount,x-systemd.device-timeout=30,_netdev 0 0.

refer archwiki-fstab#remote filesystem

Beacuse my OS is Cent6.9 without systemd, so it does not work for me.(maybe there are some options for init, please tell me if you know :) )

2nd problem description

I had added rule in fstab, but glusterfs could not be automatically mounted after booted. Ver 3.10.

I execute command mount -a, the filesystem could be mounted.

Saw log file /etc/log/boot.log, it find that filesystem was mounted failed.

Saw log file /var/log/gluster/<your gluster volume name>.log, it said that connected gluster server failed (but ping server, it is fine).

I think that maybe network was not ready while mounting?

3th my inelegant solution

I search many issues, blogs or forums, the problem was not be solved...👿

In the end, I given up , and added a command in /etc/rc.local

sleep 30s
mount -a

This solution is ugly(maybe), but the world will be beautiful again after system reboot.😂

Related Topic