I have posted this problem from here
I am running sharded mongodb in a kubernetes environment, with 3 shards, and 3 instances on each shard. for some reasons, my mongodb instance have been rescheduled to another machine.
the problem is when a mongodb instance have ben rescheduled to another instance, its replica config
will be invalidated. resulting to this error below.
> rs.status()
{
"state" : 10,
"stateStr" : "REMOVED",
"uptime" : 2110,
"optime" : Timestamp(1448462710, 6),
"optimeDate" : ISODate("2015-11-25T14:45:10Z"),
"ok" : 0,
"errmsg" : "Our replica set config is invalid or we are not a member of it",
"code" : 93
}
>
this is the config
> rs.config().members
[
{
"_id" : 0,
"host" : "mongodb-shard2-service:27038",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : 0,
"votes" : 1
},
{
"_id" : 1,
"host" : "shard2-slave2-service:27039",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : 0,
"votes" : 1
},
{
"_id" : 2,
"host" : "shard2-slave1-service:27033",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : 0,
"votes" : 1
}
]
and a sample of db.serverStatus()
of a rescheduled mongodb instance
> db.serverStatus()
{
"host" : "mongodb-shard2-master-ofgrb",
"version" : "3.0.7",
"process" : "mongod",
"pid" : NumberLong(8),
I hope I am making sense.. because, I will be using this in live production very soon.. thank you!!
Best Answer
For those who want to use the old way of setting up mongo (using ReplicationControllers or Deployments instead of PetSet), the problem seems to be in the hostname assignment delay of kubernetes Services. The solution is to add a 10 seconds delay in the container entrypoint (before starting the actual mongo):
related discussion: https://jira.mongodb.org/browse/SERVER-24778