Sql-server – How should I perform database maintenance on a 24×7 system

high-availabilitymaintenancesql server

I'm a software developer who inherited a part-time DBA role. I'm responsible for an application backend by a small, high-volume 24×7 database on SQL Server 2008.

While there's other stuff in the DB, the critical piece is a 50GB, 7.5M row table that serves 100K requests/sec during peak load, and about half that at "night". This is 99%+ read traffic, but the writes are constant, and required.

I need to be able to perform periodic maintenance without a maintenance window. Say an index rebuild, a job to purge old data, Windows Update, or hardware upgrade. Most of the advice I've seen is along the lines of "MAKE a maintenance window." While I appreciate the sentiment, I hope there's another way. If it will solve this problem, I do have the ability to purchase new hardware or modify the database, the clients (a set of web services servers), and much of the application code (ADO.NET + ASP.NET).

I've been thinking along the lines of using the warm spare (or a 3rd server) to do the maintenance, and then "swap" it into production.
1 Synchronize the spare by restoring backups, including a current transaction log.
2 Perform the maintenance tasks.
3 Reconfigure clients to connect to the spare server. Existing connections are finished within a minute or so.
4 The spare server is now the production server.

The problem remaining is that the new production server is now out of date by however long it took to perform maintenance. Is there some way that the original production server can be made to queue up changes and merge them to the spare between steps 2 and 3? Any other ideas?

Best Answer

You will need to read up on the High Availability features offered by SQL 2008. See here for the white paper. There are just too many scenarios to cover.

[climbing on soap box]

You do need maintenance windows though. That does not imply weekly or even monthly, but there will be times you need the server down to make improvements. There are very few systems that are 100% operable over long periods (years), or that need to be. The purpose of maintenance windows is to minimize the downtime and make it predictable, so the business can cope with it. I would advise against dismissing maintenance windows all together. If you do change your configuration by moving to clustering or the like you will certainly have some down time just during the transition, however small. [climbing back down]

Some options:

  1. Mirroring
  2. Clustering
  3. Log Shipping
  4. Online index operations
  5. archiving using online index operations and partitioned tables
  6. Peer to Peer replication

My 2 cents.