Cassandra – hardware planning

cassandra

Briefly: if I have 5 Tb of data and want to deploy this on 5 cassandra servers – does each machine need to have 5 Tb of disk space for data (not counting log space)? From the docs it sounds like at times cassandra will need 2x the data size – so 10Tb / server or 10Tb total in the array?

How much RAM should each machine have? Assume that the 5Tb is all in the same column space. I had been planning to max out the RAM on each machine but I'm not sure that's enough. Do I need an array of servers with a total of 5Tb of RAM?

Best Answer

If you spread evenly your 5 TB of data on you 5 servers, each server will host 1 TB of data. Because of compaction needs, each server will need 2 TB of disk space (in the worst case, a compaction needs twice as much space on disk as you have data), which means 10 TB total in your cluster.

The case above is where you only store a single replica of your data among the cluster. In this case, if a server fail, one fifth of your data will be unreachable. If you want to store 2 replica of your data in the cluster, each node will need 4 TB of disk space, which means 20 TB total in your cluster.

Related Topic