Database Design – How to Store Data Recorded with Different Frequencies

datadata structuresdatabase-design

I'm writing a metallurgical web application, that must store data (electrical parameters) recorded at different frequencies. For example:

  • six parameters recorded with frequency of one measure per second,
  • three parameters recorded with frequency of ten measures per second.

(all parameters are decimals with six digits following the decimal point — DECIMAL(10,6)).

How should I design database structure to store such values efficiently? And to be able to query this kind of datasets easily for presenting them on any kind of graph?

Possibilities (ideas?), I came so far with:

  1. Store both parameters sets in two different tables, with one record per corresponding time unit, store all parameters in columns declared as above (decimals), and query one record from first table and ten records from second table per each point on drawn graph. That would make 11 records per second.

  2. Store all parameters in one table, with one record per one second, store each parameter from first group in decimal columns and each parameter from second group as an array of decimals, and query one record from this table only per each point of drawn graph. That would make 1 record per second.

  3. Something different (if both above are wrong).

I have no idea which solution would be better in terms of better performance and lesser data redundancy.

First solutions seems to consume much more database space, because you have eleven records per each second of measures (and we can deal even with thousands of devices getting their measures each second), but in the same time seems to be faster on parsing, because you have direct access to each value in each record and in each column. While, in second case, you should get much lesser database size (cause we have one record per each second and each device) but parsing this data seems slower, because you have to parse array of ten values per each second and each parameter measured with 1/100 ms freq.

If my (above) assumtions are correct, then this question narrows to answering, what is more important in today database design — database size or parsing time? For low-cost solutions (like shared hosting or low-hardware VPSes) I'd assume, that database size matters far more and therefore I'd go into second option.

I'm mainly focused on using MySQL, but I can accept nearly any other kind of database, if there would be any advantage of using different RDBMS.

Additional information:

  1. There is no need (at least in my case) to store in DB microtime in case of second group of paramteres (i.e. those, that are recorded with 1/100 ms frequency). They only need to be correctly linked to that "second" (timestamp) under which other parameters (recorded with 1/1 s frequency) are stored.

  2. There is no need to store each parameter in a separate record in database. As my ideas shows, all parameteres can be stored in one record per each measure.

Best Answer

Probably the simplest solution is to collect the 1/10th second data in cache and write them with the 1 second fact when it arrives. Storing both values at the same time removes the complexity of dealing with summary tables and the related cleanup tasks. A good way to handle the 1/10th value is to add all the samples and divide by the # collected when the 1 second fact arrives. You can also keep other statistics of interest on the 1/10th fact such as min, max, stddev, and etc depending on the value of the 1/10th second values for information like spikes. A sample table design might be: fact 1 is the 1 sec fact 2 is the 1/10th sec timestamp fact_1 fact_2_avg fact_2_min fact_2_max fact_2_count fact_2_stddev

After this is in place, you might consider the granularity of the data you need over time and consider rolling both values up before writing them to the database so you might only write every minute or 5 minute or some time that is configured in the table.

Also, as the system runs for months or years you may consider if you need to know what the value was on 3/15/2012 12:03.55. If you don't need it, consider summary levels like you might see in MRTG where you have a live view for 24 hours, a 5 minute summary at a weekly view, then a 1 hour view at 30 days, and etc. This allows you to keep data and maintain fast performance and monitoring of trends without having the query through 31 million of rows of data per year.

Another trick for dealing with data like this is to use a table that is a 24 hour rolling table. Your key just uses hour, minute, second and no date. This way you have a free rolling 24 hour table. Just watch out for gaps (clean data between your last sample and this one so you don't have old data between your inserted values if you miss a second or two while collecting data). Think of it as an update instead of insert and you'll understand how this table works.

Related Topic