MySQL : table organisation for very large sets with high update frequency

indexingMySQLperformancescaling

I'm facing a dilemma in the choice of my MySQL schema application. So before I start here is a picture extremely simplified of my database :

Schema here : http://i43.tinypic.com/2wp5lxz.png

In one sentence : for each customer, the application harvest text data and attached tags to each data collected.

As approximation of the usage of each table, here is what I expect :

  • customer : ~5000, shouldn't grow fast
  • data : 5 millions per customer, could double or triple for big customers.
  • tag : ~1000, quite fixed size
  • data_tag : hundred of millions per customer easily. Each data can be tagged a lot.

The harvesting process is permanent, that means that around every 15 minutes new data come and are tagged, that require a very constant index refreshing.

A lot of my queries are a SELECT COUNT of DATA between specific DATES and tagged with a specific TAG on a specific CUSTOMER (very rarely it will involve several customers).

Here is the situation, you can imagine with this kind of volume of data I'm facing a challenge in term of data organization and indexing. Again, it's a very minimalistic and simplified version of my structure. My question is, is it better:

  1. to stick with this model and to manage crazy index optimization ? (which involves potentially having billions of rows in the data_tag table)
  2. change the schema and use one data table and one data_tag table per customer ? (which involves having 5000 tables on my database)

I'm running all of this on a MySQL 5.0 dedicated server (quad-core, 8Go of ram) replicated. I only use InnoDB, I also have another server that run Sphinx. So knowing all of this, I can't wait to hear your opinion about this.

Thanks.


edit

Thanks to your answers I realize how crazy this numbers are. So here is an updated more realistic usage of the tables (based on the actual server that is just a basic rackspace box).

  • customer : 2000 (fixed)
  • data : 1 million per customer (fixed, archiving of older data. And very unfair : some customers have few thousand, the biggest 5 million)
  • tag : 1000 (fixed)
  • data_tag : ~3 or 5 millions per customer (depend on data, so unfair too).

Thank you.

Best Answer

My 2-cents based on my experience using MySQL for many years is that your latter option sounds more logical and realistic.

Going with one Data and one data_tag per customer has simpler overall manageability than your current schema. Coding for your second option will be simpler as well.

You can ask many more MySQL experts; your second option is the best.

I can go into detail if you like, this is a simple answer for a simplified question to a big issue. it goes both ways

Related Topic