Database – How to off load work from the database

cachingdatabaseperformancescalability

In at least the web development field, web servers are everywhere but are backed by very few database servers. As web servers get hit with requests they execute large queries on the database, putting the server under heavy load. While web servers are very easy to scale, db servers are much harder (at least from what I know), making them a precious resource.

One way I've heard from some answers here to take load off the database is to offload the work to the web server. Easy, until you realize that now you've got a ton of traffic internally as a web server tries to run SELECT TOP 3000 (presumably to crunch the results on its own), which still slows things down.

What are other ways to take load off the database?

Best Answer

Here is a list of standard options.

  1. Optimize the access to the database to only do what you need, efficiently. A good DBA can help here a lot. This is a basic step that most companies do.

  2. Cache data away from the database using something like memcached. This is usually done at the application layer, and is highly effective. Virtually every competent website should do this.

  3. More ambitiously, maintain read-only copies of the database, and direct queries there when possible. On the database side the necessary technology is called "replication" and the read-only copies are often also backups for failover from the main database. If you're doing a million dynamic pages per hour, odds are that you are doing this, or have thought about it.

  4. Buy really, really expensive hardware for the database. I know that PayPal did this as of 4 years ago, and changing their architecture would have been difficult so they possibly still are.

  5. Shard the database into multiple pieces with ranges of data. This is a very intrusive change into application design. A well-known example of a company that does this is eBay.

  6. Try to use a database that scales onto multiple machines. Oracle RAC scales onto clusters, but doesn't let you distribute data widely. Other offerings exist that are supposed to be easier to distribute, including Microsoft's SQL Azure and FathomDB. I have not used those offerings and don't know how well they work. I suspect better than nothing, but I doubt they scale horizontally that well.

  7. Relational databases generally try to provide ACID guarantees. But the CAP theorem makes it very difficult to do that in a distributed system, particularly while letting you do things like join data. Therefore people have come up with many NoSQL alternatives that explicitly offer weaker guarantees and avoid problematic operations in return for fully distributed scalability. Well-known examples of companies that use scalable NoSQL data stores include Google, Facebook and Twitter.