Context:
We have a Cassandra cluster with 3 nodes deployed as a Stateful Set in Openshift. The three nodes are configured in the same datacenter, same rack.
I also made a script to test the Cassandra consistency level errors. It runs as a pod within Openshift, connects to the cluster and runs a select query in a loop. It knows the IP addresses of all Cassandra nodes.
Problem:
If I reduce the replica number from 3 to 2 in the stateful set (which also runs nodetool drain
on that node), the script can't connect to the cluster anymore. I get the following error:
cassandra.cluster.NoHostAvailable: ('Unable to connect to any
servers', {'172.17.0.10': OSError(None, "Tried connecting to
[('172.17.0.10', 9042)]. Last error: timed out"), '172.17.0.9':
AuthenticationFailed('Failed to authenticate to 172.17.0.9: Err or
from server: code=0100 [Bad credentials] message="Error during
authentication of user admin : org.apache.cassandra.excepti
ons.UnavailableException: Cannot achieve consistency level
LOCAL_ONE"',), '172.17.0.8': ConnectionRefusedError(111, "Tried co
nnecting to [('172.17.0.8', 9042)]. Last error: Connection refused"),
'172.17.0.11': AuthenticationFailed('Failed to authenticate to
172.17.0.11: Error from server: code=0100 [Bad credentials] message="Error during authentication of user admin :
org.apache.cassandra.exceptions.UnavailableException: Cannot achieve
consistency level LOCAL_ONE"',)})
Question:
Since two nodes are still available, why can't the authentication get the LOCAL_ONE consistency level, and how can I solve my issue?
Best Answer
When you created cluster - did you change the replication factor for
system_auth
keyspace? If not, then you need to bring that node back, and change replication factor for it to 3.See detailed instructions here.