When and how should we shard MongoDB when we are bound to physical machines

mongodbreplicationsharding

We maintain a search service that serves data from MongoDB. Our Mongo production instance is arranged in a 4 node replica set across four physical servers.

The database is comprised of several small collections and one large collection. The large collection has the following characteristics:

  • number of documents: 35 million
  • average document size: ~4.2 kB
  • collection size: 151 GB
  • storageSize: 157 GB

Over the next year we anticipate that the number of documents in this collection will double to ~70 million and a doubling in the size of the collection.

I am conscious that the "Sharding Existing Collection Data Size" section of the Mongo Reference Limits document, it's specified that "For existing collections that hold documents, MongoDB supports enabling sharding on any collections that contains less than 256 gigabytes of data. MongoDB may be able to shard collections with as many as 400 gigabytes depending on the distribution of document sizes". Consequently, we would like to shard well before we reach the 256 gigabytes of data.

We are have some constraints on resourcing and we are not (yet) in a position to virtualise. However, we are in a position where I can purchase two new servers, bringing the total to six production machines.

My question is, is it possible to split Mongo into two shards where each one is a 3-server replica set with only six physical servers? I am conscious that in addition to the replica sets we require three config servers and a mongos server?

Should we even be sharding? Our current RAM usage and the number of connections are currently well within acceptable levels. Is there other strategies we might adopt to enable our database to grow that doesn't involve sharding?

Best Answer

1) why do u need 4 nodes for replica set? using even number of nodes in a replica set can be very problematic, since when a failover happens, there is an election between the nodes to decide which will become the primary, read this -> http://docs.mongodb.org/manual/core/replica-set-elections/

3 nodes are more than enough, 2 actual db nodes and 1 small arbitar that just helps in election

2) regarding shard cluster -> the minimum number of physical servers for a cluster with 2 shards with the minimum replica set per shard is 9(!), the split is as follows: shard 1(replica set): 2 data nodes + 1 arbitar(can be micro instance) shard 2(replica set): 2 data nodes + 1 arbitar (can be micro instance) 3 config servers (MUST!!) - these can be rather small machines - we use t1.micro instance on amazon AWS.

Each shard u want to add to the cluster will cost u 3 more physical nodes as above.

mongos -> these are client instances that your applcation mongo driver should interact with. U can deploy them as part of any web server, so you dont need a separate machine.

see this for more info - http://docs.mongodb.org/manual/core/sharded-cluster-architectures-production/

Related Topic