Cpu overuse replicating a Gluster Volume

glusterfs

I've this scenario:

srv01
srv02
srv03

there is a gluster volume "vol1" running on srv03, and all the servers can use for i/o. vol1 contains a lot of mixed side images, ranging from few kbs to 3-4Mb, The total amount is about 1.5TB.

Gluster version is 3.6.2

It's not a silver bullet, need some tuning, but works pretty well.

Now I've to replicate srv03's brick to the other servers.

The problem is that srv03's cpu skyrockets to 100% and cannot serve normal
requests. Net traffic is low.

Options are:

cluster.data-self-heal-algorithm: full

cluster.self-heal-daemon: off

performance.cache-size: 1gb

I've to keep the service running while the replication is running, Your suggestions are welcome

Best Answer

I am somehow working on a similar situation. If your bottleneck is the CPU I think that decreasing cluster.background-self-heal-count should help (default is 16). In other words "when your client tries to open 17 files, it'll hang on the 17th waiting for a self-heal" (https://botbot.me/freenode/gluster/msg/45681458/).

Related Topic