Downtime for increasing AWS RDS storage

amazon-rdsamazon-web-services

I am looking to increase storage of two RDS instances (just the storage space allocated, not the instance type or other parameters). The documentation at https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIOPS.StorageTypes.html#USER_PIOPS.ModifyingExisting suggests:

You can change from standard storage to Provisioned IOPS storage, or
from Provisioned IOPS to standard storage, as well as increase
storage, with little to no downtime.

I would definitely schedule a maintenance window before performing the change. But the documentation seems a little vague in this area. For someone who might have done this before, what is "little to no downtime"? Can I expect 5 seconds or is it more like 5 minutes?

Update July, 2019:

I've updated the link to the correct and updated AWS documentation (which was broken). The newer documentation has a blurb that helps answer the original question as well:

In most cases, scaling storage doesn't require any outage and doesn't degrade performance of the server. After you modify the storage size for a DB instance, the status of the DB instance is Storage-optimization. The DB instance is fully operational after a storage modification. However, you can't make further storage modifications either for six hours or while the DB instance status is storage-optimization, whichever is longer.

However, a special case is if you have a SQL Server DB instance and haven't modified the storage configuration since November 2017. In this case, you might experience a short outage of a few minutes when you modify your DB instance to increase the allocated storage. After the outage, the DB instance is online but in the Storage-optimization state. Performance might be degraded during storage optimization.

Best Answer

First, note that you may be looking at the incorrect operation -- you describe that you want to change storage size, but have quoted documentation describing storage type. This is an important distinction: RDS advises that you won't experience an outage for changing storage size, but that you will experience an outage for changing storage type.

Expect degraded performance for changing storage size, the duration and impact of which will depend on several factors:

  • Your RDS instance type
  • Configuration
  • Will this occur during maintenance?
  • Will these changes occur first on your Multi-AZ slave, and then failover?
  • Current database size
  • Candidate database size
  • AWS capacity to handle this request at your requested time of day, at your requested availability zone, in your requested region
  • Engine type (for Amazon Aurora users, storage additions are managed by RDS as-needed in 10 GB increments, so this discussion is moot)

With this in mind, you would be better served by testing this yourself, in your environment, and on your terms. Try experimenting with the following:

  • Restoring a new RDS instance from a snapshot of your existing instance, and performing this operation on the new clone.
  • With this clone:
    • Increase the size at different times of day, when you would expect a different load on AWS.
    • Increase to different sizes.
    • Try it with multi-AZ. See if your real downtime changes as compared to not enabling multi-AZ.
    • Try it during a maintenance window, and compare it with applying the change immediately.

This will cost a bit more (it doesn't have to... you could do most of that in 1-3 instance-hours), but you will get a much cleaner answer than peddling for our experiences in a myriad of different RDS environments.

If you're still looking for a "ballpark" answer, I would advise to plan for at least performance degradation in the scope of minutes, not seconds -- again dependent very much on your environment and configuration.

For reference, I most recently applied this exact operation to add 10GB to a 40GB db.m1.small type instance on a Saturday afternoon (in EST). The instance remained in a "modifying" state for approximately 17 minutes. Note that the modifying state does not describe real downtime, but rather the duration that the operation is being applied. You won't be able to apply additional changes to the actual instance (although you can still access the DB itself) and this is also the duration that you can expect any performance degradation to occur.

Note : If you're only planning on changing the storage size an outage is unexpected, but note that it can occur if this change is made in conjunction with other operations like changing the instance identifier/class, or storage type.