I've this scenario:
srv01
srv02
srv03
there is a gluster volume "vol1" running on srv03, and all the servers can use for i/o. vol1 contains a lot of mixed side images, ranging from few kbs to 3-4Mb, The total amount is about 1.5TB.
Gluster version is 3.6.2
It's not a silver bullet, need some tuning, but works pretty well.
Now I've to replicate srv03's brick to the other servers.
The problem is that srv03's cpu skyrockets to 100% and cannot serve normal
requests. Net traffic is low.
Options are:
cluster.data-self-heal-algorithm: full
cluster.self-heal-daemon: off
performance.cache-size: 1gb
I've to keep the service running while the replication is running, Your suggestions are welcome
Best Answer
I am somehow working on a similar situation. If your bottleneck is the CPU I think that decreasing
cluster.background-self-heal-count
should help (default is 16). In other words "when your client tries to open 17 files, it'll hang on the 17th waiting for a self-heal" (https://botbot.me/freenode/gluster/msg/45681458/).