Linux – the standard system architecture for MongoDB

ArchitecturedatabaselinuxmongodbUbuntu

I know this question is too vague, so I would like to add some key numbers to give insights about what the scenario is

Size of each document size - 360KB
Total documents - 1.5 million
Document created/day - 2k
read intensive - YES
Availability requirement - HIGH  

With these requirements in mind, here is what I believe should be the architecture, but not too sure, please share your experiences and point me to right direction.

2 Linux Boxes (Ubuntu 11 each on a different rack setup for availability)
64-bit Mongo Database 
1 master (for read/write) and 1 slave (read-only with replication ON)
Sharding not needed at this point in time

Best Answer

You're starting out with at least 500GB of data, and growing at a rate of ~700MB per day. You may want to consider sharding from the get go (perhaps just a single shard) so you can keep the per-server data manageable. We've (MongoHQ) found that 500GB is a good upper limit for a single server/replica set setup. Sharding would require that you run at least one mongos and 3 config servers in addition to the replica set, and do the research to pick a good shard key.

That said, you need to figure out how big your working set is and make sure you have enough RAM to hold it. The working set is defined as "the portion of documents + indexes you access over a given amount of time", our typical rule of thumb is about 1GB of memory per 10GB of storage with slow-ish disks. This is highly, highly dependent on your data and access patterns though. SSDs become useful when you have a pathological working set and keeping it all in memory would be expensive. Run mongostat against a simulation load and look at the "faults" column to get an idea of how often the DB is going to disk.

A simple replica set is a good start. If you are doing reads from the secondary, though, you really should have a 3 member setup rather than just two (you'll need an arbiter for two anyway). People get themselves in trouble when they load up two servers with reads, one dies, and their app overwhelms the one remaining server. Having 3 smaller servers is much more desirable than 2 larger servers.

Secondary reads can cause you app problems, too. You need to make sure your app can handle any replication lag you might encounter. You probably won't run into this right away, but it will happen if you ever take a secondary offline for maintenance and you read from it before it has time to catch up.