high-availability cassandra openshift-origin – Fix Cassandra Authentication Consistency Level LOCAL_ONE Error

cassandrahigh-availabilityopenshift-origin

Context:

We have a Cassandra cluster with 3 nodes deployed as a Stateful Set in Openshift. The three nodes are configured in the same datacenter, same rack.

I also made a script to test the Cassandra consistency level errors. It runs as a pod within Openshift, connects to the cluster and runs a select query in a loop. It knows the IP addresses of all Cassandra nodes.

Problem:

If I reduce the replica number from 3 to 2 in the stateful set (which also runs nodetool drain on that node), the script can't connect to the cluster anymore. I get the following error:

cassandra.cluster.NoHostAvailable: ('Unable to connect to any
servers', {'172.17.0.10': OSError(None, "Tried connecting to
[('172.17.0.10', 9042)]. Last error: timed out"), '172.17.0.9':
AuthenticationFailed('Failed to authenticate to 172.17.0.9: Err or
from server: code=0100 [Bad credentials] message="Error during
authentication of user admin : org.apache.cassandra.excepti
ons.UnavailableException: Cannot achieve consistency level
LOCAL_ONE"',), '172.17.0.8': ConnectionRefusedError(111, "Tried co
nnecting to [('172.17.0.8', 9042)]. Last error: Connection refused"),
'172.17.0.11': AuthenticationFailed('Failed to authenticate to
172.17.0.11: Error from server: code=0100 [Bad credentials] message="Error during authentication of user admin :
org.apache.cassandra.exceptions.UnavailableException: Cannot achieve
consistency level LOCAL_ONE"',)})

Question:

Since two nodes are still available, why can't the authentication get the LOCAL_ONE consistency level, and how can I solve my issue?

Best Answer

When you created cluster - did you change the replication factor for system_auth keyspace? If not, then you need to bring that node back, and change replication factor for it to 3.

See detailed instructions here.