Persistent storage in EKS cluster with multiple availability zones

I have an EKS cluster with one linux worker node, which may instantiate in any availability zone within a region. I need to use persistent storage volume so my data won't be lost in case the node dies. It is worth mentioning that I'm talking about RabbitMQ data.

I've tried using an EBS volume, but it has a hard limitation in which it is bound to a single Availability Zone. In case the node dies, and then instantiates to a different AZ, it fails to mount the EBS volume.

So far I have the following ideas:

Have a single EBS volume attached to a worker node. When the worker node restarts in a different Availability Zone, create an EBS snapshot, and use it to create a new EBS volume in the correct Availability Zone. The new node instance will mount the new EBS volume.
Have a worker node for each Availability Zone, with a dedicated EBS volume. RabbitMQ can automatically duplicate the data across the EBS volumes. This eliminates the need for using EBS snapshots, as suggested in solution 1.
Have a single EFS volume which can be attached to multiple nodes across all Availability Zones.

In addition, I came across this post which explains more sophisticated approaches for my issue:

The other option I would recommend for Kubernetes 1.10/1.11 is to control where your volumes are created and where your pods are scheduled:

To create volumes in pre-determined zones, you can create custom StorageClass objects for each zone you want to use (see https://kubernetes.io/docs/concepts/storage/storage-classes/#aws-ebs).

To specify the zones where your pods with PVs are scheduled, you can use affinity or nodeSelector: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/

If you are using cluster auto-scaling, keep in mind that you probably need separate auto-scaling-groups for each AZ (see kubernetes/autoscaler#501)
You can also read a bit about this here: kubernetes/kubernetes#34583

Can you help me in comparing these approaches? For example, in terms of scalability, cost-efficiency, maintainability…
Or perhaps you can think of a better one?

Best Answer

The solution to this problem is using EFS instead of EBS, this will ensure that when a node dies, new pods will be able to connect to the same storage.

EFS is replicated across multiple availability zones, and it cost 3x more then EBS.

you may want to consider more cost effective solution with less admin overhead by using a hosted message queue service like Kafka or Kinesis .. etc

Best Answer

Related Solutions

Amazon EC2 – AMI vs. EBS vs. Snapshot vs. Volume Terminology

Related Topic