Start with an ebs root based instance to begin with.
I've converted most of mine to these.
I did try to convert some existing ones to ebs only,but after 3 or 4 hours, I found out
I could just re-install all the needed binary packages, and copy across our folks code,data,etc
From https://console.aws.amazon.com/ec2/home?region=us-east-1#s=LaunchInstanceWizard
(the launch instance button),
click the "Viewing" drop-down that defaults to all images and pick EBS images.
Many Fedora,Ubuntu, Amazon-Linux, to pick from. Note: on all these it shows
"Root Device: EBS"...
Boot it with your other choices, certs, region, architecture,etc.
login to it,customize it, fix it up as you see fit.
stop it. NOT TERMINATE
start it again, and everything on root is as you left it.
There are some startup scripts amazon or somebody supplies that re-init /mnt each time,
but I just have separate EBS backups of our base software.
This setup is ideal for us, where we do not have huge load spikes,but instead have
occasional tasks that take 2 x our regular hosts, and so I've got half a dozen instances
that are "STOPPED" and not getting any CPU charges (but they do take up minuscule S3
storage charges).
So this leaves you with permanent root stuff,not transient,and you stop,start,as you
need.
Any of the EBS instances, you can "boot more like this", if you need 20 in a hurry.
Note2: If you attach big EBS volumes, to an EBS based AMI and pick Boot more like this
it makes copies of those attached volumes. and that can take a while to have it boot,
as well as unexpected storage charges with all these funky snapshots laying about.
You can probably do this thru the cli tools also,but I found the console easy enough.
First, if you take a snapshot, it will include the oplog - the oplog is just a capped collection living in the local database. Snapshots will get back to a point in time, and assuming you have journaling enabled (it is on by default), you do not need to do anything special for the snapshot to function as a backup.
The only absolute requirement is that the EBS snapshot has to be recent enough to fall within your oplog window - that is the last (most recent) operation recorded in the snapshot backup oplog must also still be in the oplog of the current primary so that they can find a common point. If that is the case it will work something like this:
- You restore a secondary from an EBS snapshot backup
- The
mongod
starts, looks for (and applies) any relevant journal files
- Next, the secondary connects to the primary and finds a common point in the two oplogs
- Any subsequent operations from the primary are applied on the RECOVERING secondary
- Once the secondary catches up sufficiently, it moves to the SECONDARY state and the backup is complete
If the snapshot is not recent enough, then it can be discarded - without a common point in the oplog, the secondary will have to resync from scratch anyway.
To answer your specific questions:
Do I need to record oplogs and use those in conjunction to restore
after a failure?
As explained above, if you snapshot, you already are backing up the oplog
Should I spin up another instance within the replica set specifically
for backups and snapshot that vs. taking snapshots of primary and
secondary? If so, we're back to the oplog issue aren't we?
There's no oplog issue beyond the common point/window one I mentioned above. Some people do choose to have a Secondary (usually hidden) for this purpose to avoid adding load to a normal node. Note: even a hidden member gets a vote, so if you added one for backup purposes you can remove the arbiter from your config, you would still have 3 voting members.
Should I snapshot each replica volume and rely on on the replica set
completely to cover the time between failure and the last snapshot?
Every member of a replica set is intended to be identical - the data is the same, any secondary can become primary etc. - these are not slaves, every replica set member contains the full oplog and all the data.
So, taking multiple snapshots (assuming you trust the process) is going to be redundant (of course you may want that redundancy). And yes, the whole intention of the replica set functionality is to ensure that you don't need to take extraordinary measures to use a secondary in this way (with the caveats above in mind, of course).
Best Answer
IOStat numbers are what you really need to gauge an impact here, preferably over time and relatable to MongoDB load (so I would recommend MMS with the munin-node plugin for monitoring). If you are not seeing high levels of latency on svctime and queue size (and if you are you would probably not have idle CPU, you would be seeing IOWait spike).
The general consensus from the various analyses out there has been this:
Since your IO does not appear to be your bottleneck (I am going on limited evidence/information here) I don't think you are going to see much more than the 100 PIOPS you currently see, but you should have more predictable performance.