First some background:
I work at a company that runs a PHP-webapplication. We have a storage backend mounted over NFS on several webservers. Today we have the issue if one webserver writes a file over NFS, sometimes the file does not appear at other mounted clients until a few minutes later. It is also not redundant so we cannot perform any "invisible" maintenance.
I've been looking at migrating to a GlusterFS solution (two or three replicated bricks/machines for redundancy). Now, using XFS as the storage filesystem "behind" Gluster works very well, performance wise. Gluster also does not seem to have the sync problem mentioned above.
However, I would like to use ZFS as the backend filesystem, the reasons being;
- Cheap compression (currently storing 1.5TB uncompressed)
- Very easy to expand the storage-volume "live" (one command, compared
the LVM mess) - Snapshotting, bit-rot protection and all the other ZFS glory.
In my demo-setup of the solution I have three servers with Replicated Gluster with a ZFS backend pool at a separate disk on each server. I'm using CentOS 6.5 with ZFS on Linux (0.6.2) + GlusterFS 3.4. I have also tried with Ubuntu 13.10. Everything is in VMware ESX.
To test this setup I have mounted the volume over Gluster, and then running BlogBench (http://www.pureftpd.org/project/blogbench) to simulate load. The issue I'm having is that at the end of the test, the ZFS storage seems to get stuck in a deadlock. All three machines have "zfs_iput_taskq" running at 90-100% CPU, and the test freezes. If I abort the test, the deadlock does not go away, only option seems to be hard reboot.
I have tried:
- Disabled atime
- Disabled scheduler (noop)
- Different compression/no compression
- Blogbench directly on ZFS works fine
- Blogbench on Gluster + XFS as backend works fine
Ideas? Should I just drop ZFS and go with something else? alternatives?
Regards Oscar
Best Answer
ZFS on Linux needs a bit of basic tuning in order to operate well under load. There's a bit of a struggle between the ZFS ARC and the Linux virtual memory subsystem.
For your CentOS systems, try the following:
Create an
/etc/modprobe.d/zfs.conf
configuration file. This is read during the module load/boot.Add something like:
Where zfs_arc_max is roughly 40% of your RAM in bytes (Edit: try
zfs_arc_max=1200000000
). The compiled-in default for zfs_vdev_max_pending is 8 or 10, depending on version. The value should be high (48) for SSD or low-latency drives. Maybe 12-24 for SAS. Otherwise, leave at default.You'll want to also have some floor values in
/etc/sysctl.conf
Finally, with CentOS, you may want to install
tuned
andtuned-utils
and set your profile to virtual-guest withtuned-adm profile virtual-guest
.Try these and see if the problem persists.
Edit:
Run
zfs set xattr=sa storage
. Here's why. You may have to wipe the volumes and start again (I'd definitely recommend doing so).