Ubuntu – GlusterFS fails to mount on boot but mounts later in Ubuntu 12.04

glusterfsUbuntuubuntu-12.04

Having two machines, profitmargin and revisionist, I created a volume in profitmargin:

root@profitmargin:~# gluster volume info

Volume Name: uploads
Type: Distribute
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: profitmargin:/var/lib/gluster/brick01

and in revisionist I added this line to fstab to mount it at boot time:

profitmargin:/uploads /mnt/uploads glusterfs defaults,_netdev 0 0

but when the computer boots it's not mounted:

root@revisionist:~# mount
/dev/mapper/revisionist-root on / type ext4 (rw,errors=remount-ro)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
udev on /dev type devtmpfs (rw,mode=0755)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
none on /run/shm type tmpfs (rw,nosuid,nodev)
/dev/sda1 on /boot type ext2 (rw)

In the log files I found this:

root@revisionist:~# cat /var/log/glusterfs/mnt-uploads.log
[2014-05-19 10:41:18.591355] I [glusterfsd.c:1493:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.2.5
[2014-05-19 10:41:18.704144] E [common-utils.c:125:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Name or service not known)
[2014-05-19 10:41:18.704195] E [name.c:253:af_inet_client_get_remote_sockaddr] 0-glusterfs: DNS resolution failed on host profitmargin
[2014-05-19 10:41:18.704236] E [glusterfsd-mgmt.c:740:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: Success
[2014-05-19 10:41:18.704375] W [glusterfsd.c:727:cleanup_and_exit] (-->/usr/sbin/glusterfs(glusterfs_mgmt_init+0x1d0) [0x7f1bc152c850] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_start+0x12) [0x7f1bc0e93c72] (-->/usr/sbin/glusterfs(+0x8abf) [0x7f1bc152cabf]))) 0-: received signum (1), shutting down
[2014-05-19 10:41:18.704400] I [fuse-bridge.c:3727:fini] 0-fuse: Unmounting '/mnt/uploads'.

and if I try to mount it later, it works:

root@revisionist:~# mount -a
root@revisionist:~# mount
/dev/mapper/revisionist-root on / type ext4 (rw,errors=remount-ro)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
udev on /dev type devtmpfs (rw,mode=0755)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
none on /run/shm type tmpfs (rw,nosuid,nodev)
/dev/sda1 on /boot type ext2 (rw)
profitmargin:/uploads on /mnt/uploads type fuse.glusterfs (rw,allow_other,default_permissions,max_read=131072)

I am running Ubuntu 12.04 and I'm aware of the bug related to the init script, but I'm running GlusterFS version 3.2.5-1ubuntu1 which has the fix. I am also aware of some IPv6 so I made sure both IPv4 and IPv6 work fine:

root@revisionist:~# ping profitmargin
PING profitmargin (192.168.1.111) 56(84) bytes of data.
64 bytes from profitmargin (192.168.1.111): icmp_req=1 ttl=64 time=0.355 ms
64 bytes from profitmargin (192.168.1.111): icmp_req=2 ttl=64 time=0.417 ms
^C
--- profitmargin ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.355/0.386/0.417/0.031 ms
root@revisionist:~# ping6 profitmargin
PING profitmargin(profitmargin) 56 data bytes
64 bytes from profitmargin: icmp_seq=1 ttl=64 time=0.637 ms
64 bytes from profitmargin: icmp_seq=2 ttl=64 time=0.472 ms
64 bytes from profitmargin: icmp_seq=3 ttl=64 time=0.407 ms
64 bytes from profitmargin: icmp_seq=4 ttl=64 time=0.393 ms
64 bytes from profitmargin: icmp_seq=5 ttl=64 time=0.402 ms
^C
--- profitmargin ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 3998ms
rtt min/avg/max/mdev = 0.393/0.462/0.637/0.092 ms

Any other ideas what could be causing this issue and/or how to fix it?

Best Answer

I am no expert but it seems that it is related to the machine revisionlist not resolving profitmargin's ip address through DNS. Maybe the mount command is executed before the network service and that is why revisionlist is no able to locate profitmargin. After bringing up the network service it is able to locate the machine. You should try to find a way to GlusterFS mounts to wait for network service to be online.