Cassandra nodetool repair – how to schedule properly

cassandra

Im putting together a 16 node cassandra cluster (replication factor 2) and want to setup a schedule for nodetool repair. gc_grace_seconds is at the default.

Two questions:

  1. My first impulse is to setup a cron job for each machine and attempt to manually randomize the timing around a one week schedule. Is there a better way?
  2. Does nodetool repair have to be run on every system or every # systems/replication factor systems? (IE for my 16 nodes with replication factor 2 – 8 systems – one of each pair)

Best Answer

I would not randomize it. Your best bet is to schedule the repairs so they don't stomp on each other.

You should use the -pr option on each node when running repair.

If you're using Cassandra 2.1 you have the option for incremental repair which will speed things up considerably.

RF=2 is also a recipe for disaster.. quorum queries will fail if a node is unavailable. I recommend RF=3.

Related Topic