Tips for Designing a Scalable MySQL Database

database-designMySQLscalability

As the question states, I am a developer, not a DBA. I have experience with designing good ER schemas and am fairly knowledgeable about normalization and good schema design. I have also worked with data warehouses that use dimensional modeling with fact tables and dim tables.

However, all of the database-driven applications I've developed at previous jobs have been internal applications on the company's intranet, never receiving "real-world traffic". Furthermore, at previous jobs, I have always had a DBA or someone who knew much more than me about these things.

At this new job I just started, I've been asked to develop a public-facing application with a MySQL backend and the data stored by this application is expected to grow very rapidly. Oh, and we don't have a DBA. Well, I guess I am the DBA. 😉

As far as designing a database to be scalable, I don't even know where to start. Does anyone have any good tips or know of any good educational materials for a developer who has been sort of shoved into a DBA/database designer role and has been tasked with designing a scalable database to support an application like this? Have any other developers been through this sort of thing? What did you do to quickly become good at this role?

I've found some good slides on the subject here but it's hard to glean details from slides. Wish I could've attended that guy's talk.

I also found a good blog entry called 5 Ways to Boost MySQL Scalability which had some good information, though some of it was over my head.

tl;dr

I just want to make sure the database doesn't have to be completely redesigned when it scales up, and I'm looking for tips to get it right the first time. The answer I'm looking for is a "list of things every developer should know about making a scalable MySQL database so your application doesn't perform like crap when the data gets huge".

Best Answer

I think the usual rules apply:

  • Keep tables small (don't waste space unnecessarily).
  • Don't query for more information than you need.
  • If you use ORMs, beware of common pitfalls such as the N+1 problem.
  • Stay away from troublesome operators (e.g. Like '%Smith%').
  • Design your indexes intelligently and make sure they cover the majority of the uses (here is a decent, if light, treatment of indexes).
  • Remember that set-based querying is usually far superior in terms of performance than iterating through the data.
  • Know when to denormalise data for performance reasons.
  • Cache whatever can be cached (economically) to relieve the DB.

Of course, vertical scaling can only get you so far, and then you might have to look into horizontal scaling. Having said that, a good single-DB design can still take you very far - as far as I know, StackOverflow is still running a single DB instance. If you think you will need to handle far more data than that, consider sharding (or alternative DBs) early on.