GlusterFS Mount Keeps Disconnecting Randomly

glusterfsmountubuntu-12.04

We are using GlusterFS to provide a distributed file system on 2 web servers which use a shared media directory and cache. So i have created 2 gluster volumes (media and var) and have mounted these in /var/www/site/media and /var/www/site/var/.

Each server is running the gluster server and client so that we have some redundancy and get to keep the data replicated. The var volume is written and read very heavily.

The problem we are encountering is that randomly the mounts will break and when you perform a ls -lah on the directory it is showing as d???????. To resolve the issue all we have to do is umount the directory and remount it.

I have reviewed the glusterfs log files and can see when the mount disappeared

[2013-05-02 11:32:02.105021] I [client3_1-fops.c:502:client3_1_unlink_cbk] 0-site-media-client-1: remote operation failed: No such file or directory

[2013-05-02 11:32:02.105270] I [client3_1-fops.c:502:client3_1_unlink_cbk] 0-site-media-client-0: remote operation failed: No such file or directory


[2013-05-02 11:32:02.105299] W [fuse-bridge.c:911:fuse_unlink_cbk] 0-glusterfs-fuse: 11806336: UNLINK() /catalog/product/cache/1/image/1000x1000/9df78eab33525d08d6e5fb8d27136e95/v/e/some-stuff-0915740$


[2013-05-02 11:32:02.378497] I [client3_1-fops.c:502:client3_1_unlink_cbk] 0-site-media-client-0: remote operation failed: No such file or directory


[2013-05-02 11:32:02.378625] I [client3_1-fops.c:502:client3_1_unlink_cbk] 0-site-media-client-1: remote operation failed: No such file or directory

We would like to know what is causing these problems and to resolve them to prevent these interruptions to the service.

If you require any more information feel free to ask and I'll provide what i can.

The requested additional information is below, the two servers are identical:

Ubuntu 12.04.2

Linux VDED-XXX-XXX 3.2.0-39-generic #62-Ubuntu SMP Thu Feb 28 00:28:53 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

glusterfs 3.2.5 built on Jan 31 2012 07:39:59

VMWare ESX Servers

Errors From The Brick Logs at the time of the crash/disconnection/problem:

site-media brick log:

[2013-05-02 11:32:00.849296] I [server3_1-fops.c:964:server_unlink_cbk] 0-site-media-server: 9109306: UNLINK /catalog/product/cache/1/image/1000x1000/9df78eab33525d08d6e5fb8d27136e95/v/e/some-stuff-091574183930-box.jpg (0) ==> -1 (No such file or directory)

[2013-05-02 11:32:02.86607] I [server3_1-fops.c:964:server_unlink_cbk] 0-site-media-server: 9109345: UNLINK /catalog/product/cache/1/image/1000x1000/9df78eab33525d08d6e5fb8d27136e95/v/e/some-stuff-091574098692.jpg (0) ==> -1 (No such file or directory)

[2013-05-02 11:32:02.105131] I [server3_1-fops.c:964:server_unlink_cbk] 0-site-media-server: 12553441: UNLINK /catalog/product/cache/1/image/1000x1000/9df78eab33525d08d6e5fb8d27136e95/v/e/some-stuff-091574097992-box.jpg (0) ==> -1 (No such file or directory)

[2013-05-02 11:32:02.485694] W [inode.c:1044:inode_path] (-->/usr/lib/glusterfs/3.2.5/xlator/protocol/server.so(server_resolve+0xf8) [0x7f4534639418] (-->/usr/lib/glusterfs/3.2.5/xlator/protocol/server.so(server_resolve_inode+0x70) [0x7f4534639290] (-->/usr/lib/glusterfs/3.2.5/xlator/protocol/server.so(resolve_loc_touchup+0x105) [0x7f4534638425]))) 0-/var/gluster/wwrd-media/inode: no dentry for non-root inode 184269351: 11a65ece-7b4b-4364-a28c-63df686f5648

site-var brick log doesnt seem to contain any errors

Best Answer

Looking at the logs. It definitely seems like some kind of race condition as there is an intense writing to that folder. There were some bugs reported for these on 3.2.* Gluster Version. I suggest you to upgrade to 3.3.1, which is fully supported on your OS. A lot of bugs and performance issues were resolved. You can also do the upgrade from your version.

Related Topic