No, there is no supported way to rename a shard at present, though as you mention you could remove and re-add. Even then it is not as simple as you might think because although you can specify a name when adding a shard, it does not end there - there is the replica set itself to worry about. The name specification when adding is just the value of the _id
, see the following example (my replica set is rs0, like yours):
mongos> db.adminCommand({addShard : "rs0/host.example.com:27017,host.example.com:27018", name : "rep0"});
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId("539838845bc6bf5ee52a56ea")
}
shards:
{ "_id" : "rep0", "host" : "rs0/host.example.com:27017,host.example.com:27018" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
Note that all that has changed is the _id
for the shard, the "host"
value remains the same, because rs0
is the name of the replica set - if you try to use rep0
there it will fail to add. Hence, all that a remove and re-add will give you is a mismatch between the two names.
To change that host
value, not only do you have to remove/re-add the shard, you also have to alter the replica set config before you re-add the shard. In other words, the replSet
parameter must be changed to be rep0
also, and that means re-initializing the set - not an easy task either.
Overall, it is possible to get to where you want, but there will be a large amount of work and it will not be quick (drains, re-init of the set), especially if you have a lot of data. For the sake of changing a couple of strings, I would generally recommend leaving them as-is.
It's hard to say for sure based on such limited information, but that looks like something is killing the connection between the secondary and the primary while it is trying to sync. If it is happening repeatedly at approximately the same time, then it suggests that something in your network is enforcing a max connection time. If it is happening randomly it suggests that there is something flakey on the network itself (impossible to tell what that might be without significant troubleshooting).
It's also obvious from the snippet of logs that the secondary is under heavy load when this happens since it is taking multiple hours to run ServerStatus (usually sub 100ms when not under load) - now, it is building an index at the time, which is a blocking operation, so that may be a red herring if that is a large index. If that is not a large index, then it suggests the secondary is a little under provisioned in terms of resources.
If you can't solve the loop you can take other measures to get the secondary up and running like copying the data files, however unless you have snapshotting options that will involve stopping writes or taking down time for the duration of the copy.
Best Answer
First off, take a look and follow the steps outlined here:
http://www.mongodb.org/display/DOCS/Troubleshooting#Troubleshooting-Socketerrorsinshardedclustersandreplicasets
Next, look for any ulimit issues on the target host (a new file descriptor is required for a new socket and can cause the error):
http://www.mongodb.org/display/DOCS/Too+Many+Open+Files
Finally, there are a couple of issues related to idle connections being used when they should not be, and that can also contribute to this type of issue:
https://jira.mongodb.org/browse/SERVER-5793
Until SERVER-5632 is complete, the only remedy here is to flush the connections by restarting the mongod/mongos processes.