MongoDB secondary replica permanently in not reachable/healthy state

mongodb

I have a standard 3 node MongoDB replica set:

  • 10.0.2.35 – Primary
  • 10.0.3.169 – Secondary
  • 10.0.1.48 – Secondary

I'm currently not able to connect to them as a replica set, I can only connect through the primary. If I run rs.status() on the primary, I repeatedly get:

{
        "set" : "ecReplica",
        "date" : ISODate("2018-04-23T19:12:10.014Z"),
        "myState" : 1,
        "term" : NumberLong(-1),
        "heartbeatIntervalMillis" : NumberLong(2000),
        "members" : [
                {
                        "_id" : 0,
                        "name" : "ip-10-0-3-169:27017",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 10717677,
                        "optime" : Timestamp(1524510722, 15),
                        "optimeDate" : ISODate("2018-04-23T19:12:02Z"),
                        "lastHeartbeat" : ISODate("2018-04-23T19:12:08.186Z"),
                        "lastHeartbeatRecv" : ISODate("2018-04-23T19:12:09.656Z"),
                        "pingMs" : NumberLong(1),
                        "syncingTo" : "ip-10-0-2-35:27017",
                        "configVersion" : 405240
                },
                {
                        "_id" : 1,
                        "name" : "ip-10-0-1-48:27017",
                        "health" : 0,
                        "state" : 6,
                        "stateStr" : "(not reachable/healthy)",
                        "uptime" : 0,
                        "optime" : Timestamp(0, 0),
                        "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                        "lastHeartbeat" : ISODate("2018-04-23T19:12:09.116Z"),
                        "lastHeartbeatRecv" : ISODate("2018-04-23T19:12:08.404Z"),
                        "pingMs" : NumberLong(0),
                        "authenticated" : false,
                        "configVersion" : -1
                },
                {
                        "_id" : 2,
                        "name" : "ip-10-0-2-35:27017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 10717680,
                        "optime" : Timestamp(1524510722, 15),
                        "optimeDate" : ISODate("2018-04-23T19:12:02Z"),
                        "electionTime" : Timestamp(1524486537, 1),
                        "electionDate" : ISODate("2018-04-23T12:28:57Z"),
                        "configVersion" : 405240,
                        "self" : true
                }
        ],
        "ok" : 1
}

When I ssh into the primary, I see the following error in /var/log/mongodb/mongod.log:

2018-04-23T19:23:54.326+0000 I REPL [ReplicationExecutor] Error in heartbeat request to ip-10-0-1-48:27017; Unauthorized not authorized on admin to execute command { replSetHeartbeat: "ecReplica", pv: 1, v: 405240, from: "ip-10-0-2-35:27017", fromId: 2, checkEmpty: false }

Additional info

Connectivity

I can connect to all 3 nodes individually using Mongo Shell and Robo3T using SSH tunneling, but I can't connect to the 3 as a replica set.

Production servers apparently connects successfully to the replica set.

telnet

telnet 10.0.1.48 27017 from 10.0.2.35 works.

/etc/mongod.conf

The config files are almost exactly equal, the only difference is in the net section:

Node 10.0.1.48:

# network interfaces
net:
  port: 27017
  bindIp: [127.0.0.1,10.0.3.169,10.0.2.35]

Node 10.0.3.169:

# network interfaces
net:
  port: 27017
  bindIp: [10.0.1.48,10.0.2.35,127.0.0.1]

Node 10.0.2.35:

# network interfaces
net:
  port: 27017
  bindIp: [127.0.0.1,10.0.3.169,10.0.1.48]

NOTICE: the security section is empty, so this is not a key-file issue.

db.version()

3.2.0

Infrastructure

All nodes are running in the same AWS VPC, while they're in different Availability Zones, they belong to the same Security Group and use the same Network ACLs and Route Tables.


This is an inherited setup, it has been live for more than 2 years.

What am I missing?

Best Answer

It seems that your settings is still auth disabled in ReplicaSet.

To enable it, simply add a security.keyFile in settings or use --keyFile in command line option.

Here is an example showing how to generate such file:

openssl rand -base64 756 > <path-to-keyfile>
chmod 400 <path-to-keyfile>

then add to mongod.conf the path of the genreated keyfile:

security:
   authorization: enabled
   keyFile: /path/to/keyfile

restart the mongod service, right now your mongo should be auth enabled.

for more information about keyFile, refer to Enforce Keyfile Access Control in a Replica Set