Poor NFS Performance: OpenFiler

openfilerraidvmware-esx

Good Day Everyone,

I have an issue with OpenFiler, a Linux-based operating that converts a computer system into a SAN/NAS appliance. Here is the problem. In my environment we have two Netapp Storevault 500 appliances that I normally perform backups to a NFS share. There are two backup cronjobs that use ghettoVCB to backup two groups of VM's. One group is a pool of 3 VMs. This takes 13 mins to complete. A second job that backups a pool of 5 VMs to a 2nd Storevault appliance which takes 2 hours.

We then installed Openfiler on a old server that has 2 core Xeon processors. There is a software RAID 5 process in place. When performing the same backups to a NFS Openfiler share, the first backup job, which takes 13 mins, takes around 4 hours. The second backup job, which takes 2 hours, takes almost 10 hours to complete. This is unacceptable!!!! Especially considering the strain placed on the host ESX Server. I assumed that because of the software RAID 5, the overhead on the CPU explained the long backup times.

I then installed Openfiler on a 2nd server, an IBM x306 machine which has a P4 Intel processor. This time no software RAID or any RAID at all. A single 750GB hard drive that contained the OS and the rest of the disk uses to backup VMs to a NFS share. I performed the first backup job of the pool of 3 VMs. This time the backup job took 1 and 1/2 hours to complete instead of 13 mins!!!!!!!!!!

Is Openfiler simply poor at being an NFS Server!!!!!!!!!!!!! Has anyone else had these issues with Openfiler?

Best Answer

NFS with VMware is a special case. After every NFS transaction, VMware will call an NFS COMMIT, which forces the NFS server to sync the cached writes stored in RAM to the hard drives, greatly slowing things down. From what I can tell, there is not a way to turn this off in VMware, nor to tune the NFS window size which might alleviate this problem.

The reason you do not see this on Netapp is that since they have battery-backed RAM, their implementation of NFS returns from a COMMIT immediately - even if you pulled the power in the middle of a backup, when you plugged the Netapp back in it would still have consistent data.

The fix as I understand it, is to either tune OpenFiler's mounts with noatime etc. ; OR, create iSCSI targets on OpenFiler which will not have the same problem. It may be possible that a battery backed caching RAID controller might help, however with backing up GBs on a single write the cache may still be slower than the Netapps. You are using Gigabit Ethernet, right?