Google-app-engine – Bigtable database design theory

bigtabledatabase-designgoogle-app-engine

I am very well versed in the theory and practice of relational database design.

I know what works and what doesn't, what is performant and what is maintainable (almost – there's always place to tweak when you start having real data).

It seems I can't find a substantial body of knowledge regarding distributed scalable databases such as Google's Bigtable (for writing apps for google app engine). What works, what doesn't, what will scale, why won't?

Sure, there are some blog posts and articles, but are there books or academic research papers on designing databases for bigtable and similar database paradigms?

Best Answer

... are there books or academic research papers on designing databases for bigtable and similar database paradigms?

Well Bigtable is essentially a database itself, so I take it that your question is more on how to model and to some extent design your schema in these Bigtable like databases. More specifically you would like to know how to do this on Google's App Engine.

With GAE you will be using the Datastore API, which adds a significant layer of abstraction to Bigtable, so to some extent you don't have to worry about low level details as you would if you were using something like HBase. There are a few posts on SO (here's a great answer by a Google Engineer who I think is part of GAE team) that will guide you and offer hints on how to approach this new type of Database system.

Helpful Info:

  1. HBase was inspired by Google's Bigtable (Alternate Link) paper
  2. Hypertable was also inspired by Bigtable paper
  3. Cassandra's Data Model was inspired by Bigtable paper
  4. Hadoop was inspired by Google's GFS and MapReduce papers