Linux – Identifying cause of GlusterFS data corruption

glusterfslinuxnetwork-shareUbuntu

I have been experiencing data corruption when writing data to a replicated GlusterFS volume I have configured across two servers.

The configuration I have set up is as follows:

  • Servers are running Ubuntu 16.04 and GlusterFS v3.10.6
  • Clients are running Ubuntu 14.04 and GlusterFS v3.10.6
  • Two volumes in GlusterFS have been configured, each with two bricks distributed with one brick on each server.
  • Each brick is a MDADM RAID5 array with a EXT4/LUKS file system.
  • Each volume is configured with the default options, plus bitrot detection. These are as follows:

    features.scrub: Active
    features.bitrot: on
    features.inode-quota: on
    features.quota: on
    nfs.disable: on
    

The data corruption manifests it's self when large directories are copied from the local file system on one of the client machines to either of the configured GlusterFS volumes. When md5 checksums are calculated for the copied files and source files and the two are compared, a number of the checksums differ.

Manually triggering a self-heal on either GlusterFS volumes shows no files identified for healing. Additionally, looking at the output from gluster volume bitrot <volname> scrub status and the output logs in /var/log/glusterfs/bitd.log and /var/log/glusterfs/scrub.log don't seem to identify any errors.

These issues have only manifested themselves, recently after around a week of both volumes being used fairly heavily by ~10 clients.

I have tried taking the volumes offline and have tested writing data to each of the bricks directly via the underlying local file system and haven't been able to reproduce the issues.

To further debug the issue I have configured a similar setup on VMs in VirtualBox and haven't been able to reproduce the problem. I am therefor at rather a loss as to what may be these cause of these errors.

Any suggestions of further debugging steps I could take or known issues with GlusterFS and my configuration would be appreciated.

Best Answer

After being unable to get GlusterFS to behave properly I decided to move my setup to NFS, with a live master and a mirror synced every hour or so to provide a degree of fail over in the event of the main server going down.

Recently we were performing maintenance on the server providing the mirror and it turned out that we were having similar issues with data corruption over NFS on that server.

After much debugging of the possible causes of the corruption we eventually tracked it down to hardware offloading to the network interface, after I noticed we were also occasionally getting Disconnecting: Packet corrupt errors with large packets over SSH.

Looking into possible causes of the SSH errors, I found the following Unix & Linux question: packet_write_wait Broken pipe even leaving top running?

Some of the discussion on this thread suggested a buggy network interface driver could potentially lead to packet corruption when segmentation and rx/tx checksumming is passed off to the interface.

After disabling the rx/tx and segmentation offloading (following the instructions in the following blog post: How to solve ssh disconnect packet corrupt problems) and testing the server under heavy network load I found that SSH errors and data corruption over NFS went away.

Since I no longer have GlusterFS configured on the servers, I am unable to verify this was the cause of the data corruption we experienced. However, given the issue persisted on one of the servers after we moved to NFS it is likely that this may have been the cause of our problems.

As a side note, the network interface driver was using the e1000e driver. Subsequently I have found the following discussion on the RHEL bug tracker: Bug 504811 - e1000 silently corrupting data which suggests that packet corruption is possible as a result of hardware offloading to a network interface such as certain cards using the e1000e driver.

Related Topic