SocketException in Mongo logs

mongodb

We are running mongo for 4 months so far , But lately I am seeing a lot of

SocketException handling request, closing client connection: 9001 socket exception [2] server [127.0.0.1:58996]

How can I know the cause of this error ? Is it related to code issue or administrator configuration?

We have Fedora 16 server , mongo version 2.0.7

Best Answer

First off, take a look and follow the steps outlined here:

http://www.mongodb.org/display/DOCS/Troubleshooting#Troubleshooting-Socketerrorsinshardedclustersandreplicasets

Next, look for any ulimit issues on the target host (a new file descriptor is required for a new socket and can cause the error):

http://www.mongodb.org/display/DOCS/Too+Many+Open+Files

Finally, there are a couple of issues related to idle connections being used when they should not be, and that can also contribute to this type of issue:

https://jira.mongodb.org/browse/SERVER-5793

Until SERVER-5632 is complete, the only remedy here is to flush the connections by restarting the mongod/mongos processes.

Related Solutions

Rename Mongo Shard

No, there is no supported way to rename a shard at present, though as you mention you could remove and re-add. Even then it is not as simple as you might think because although you can specify a name when adding a shard, it does not end there - there is the replica set itself to worry about. The name specification when adding is just the value of the _id, see the following example (my replica set is rs0, like yours):

mongos> db.adminCommand({addShard : "rs0/host.example.com:27017,host.example.com:27018", name : "rep0"});

mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
    "_id" : 1,
    "version" : 4,
    "minCompatibleVersion" : 4,
    "currentVersion" : 5,
    "clusterId" : ObjectId("539838845bc6bf5ee52a56ea")
}
  shards:
    {  "_id" : "rep0",  "host" : "rs0/host.example.com:27017,host.example.com:27018" }
  databases:
    {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }

Note that all that has changed is the _id for the shard, the "host" value remains the same, because rs0 is the name of the replica set - if you try to use rep0 there it will fail to add. Hence, all that a remove and re-add will give you is a mismatch between the two names.

To change that host value, not only do you have to remove/re-add the shard, you also have to alter the replica set config before you re-add the shard. In other words, the replSet parameter must be changed to be rep0 also, and that means re-initializing the set - not an easy task either.

Overall, it is possible to get to where you want, but there will be a large amount of work and it will not be quick (drains, re-init of the set), especially if you have a lot of data. For the sake of changing a couple of strings, I would generally recommend leaving them as-is.

Azure – mongodb keep looping while trying to sync data

It's hard to say for sure based on such limited information, but that looks like something is killing the connection between the secondary and the primary while it is trying to sync. If it is happening repeatedly at approximately the same time, then it suggests that something in your network is enforcing a max connection time. If it is happening randomly it suggests that there is something flakey on the network itself (impossible to tell what that might be without significant troubleshooting).

It's also obvious from the snippet of logs that the secondary is under heavy load when this happens since it is taking multiple hours to run ServerStatus (usually sub 100ms when not under load) - now, it is building an index at the time, which is a blocking operation, so that may be a red herring if that is a large index. If that is not a large index, then it suggests the secondary is a little under provisioned in terms of resources.

If you can't solve the loop you can take other measures to get the secondary up and running like copying the data files, however unless you have snapshotting options that will involve stopping writes or taking down time for the duration of the copy.

Best Answer

Related Solutions

Rename Mongo Shard

Azure – mongodb keep looping while trying to sync data

Related Topic