Best way to query immutable MySQL Data

immutabilityMySQLquery

I have been increasingly using immutable data storing for my web-app, due to some advantages it has (simpler writes, ability to view history / activity).

I am doing it using MySQL dabatases, simply by only doing inserts; no updates/deletes; and with the id (primary key) on each row being an autoincrementing kind of 'transaction id', and then with an 'entity id' column with an appropriate name.

The problem I'm having is massively verbose MySQL queries for certain desired latest data. With your standard mutable MySQL database, you don't have to worry about getting the latest row for each table (because it's always latest as they just use updates). With immutable data I'm having to do this huge queries full of self-joins, so it returns only the subset of data which has the highest id.

For instance:

select ...
...
inner join (
    /* latest */
    select y.* from (
      select id,forum_id,optional, max(id) as max_id
      from forum_opt
      group by forum_id
      ) as x
      inner join forum_opt as y
      on y.id = x.max_id
) as forumopt on ...

Whereas before it would just be:

select ...
...
inner join forum_opt on ...

So is there a cleaner way I can implement this? Is there something I'm not understanding about this software design concept or something I'm doing wrong?

Best Answer

Yes, there is a cleaner way, it is called temporal databases, in particular "transaction-time".

The idea is to use an interval value [trans_start, trans_end[to denote, when a row was stored or changed in the database (trans_start) and how long it stayed this way (trans_end). Using a now between trans_start and trans_end and now <> trans_end condition, you're able to get the current values in your database table. This model trivially allows "as-of " queries, where your approach would make that much more difficult. Not so trivial, this model allows full-temporal joins, which gives the complete change history of a join of two or more tables with history (even if the rows mutated independently).

Joins are then more easier to formulate, but will contain subqueries as in your example, but min-operators are avoided. YMMV with execution plans, if the database applies the join and then the conditions or if first the conditions and then the join is applied.

Some databases provide this functionality using additional options, e.g. In Oracle this is called FlashBack-query.

Check out Snodgras's "Developing time-oriented database applications with SQL", which is available from his (linked) page as PDF (the first reference).

Related Topic