Taking Cassandra Backups

amazon s3backupcassandra

We currently have 12 nodes running in our Cassandra cluster. Ultimately even if a couple of the nodes go down, we're still up and running. The paranoia in me would like to do at least one backup a day and store it on Amazon S3. My question is the following:

When backing up Cassandra, is it sufficient to run the backup from one node, or do I have to run a backup script from each one of the 12 nodes and push its respective backup onto S3? If at one point a restore is required, do we have to backup from the individual nodes backup, or is there a way to "aggregate" the backups (assuming you need to take them from each node individually) into one large restore process?

Slightly confused by the documentation. Just want to get an efficient backup process rolling on my Cassandra cluster.

Best Answer

You need to back each node up, unless every node stores 100% of the data, then you can back only one node up.