Mysql – Inner Join versus Union All

inner-joinMySQLsqlunion

Which version of the query is faster / best practice? (Curiousity thing.)

More importantly, are they equivalent?

Do these queries accomplish the same thing in this example?

1) INNER JOIN with two OR conditions:

SELECT
  DISTINCT (cat.id) as 'Cat ID:'
FROM
  cat
INNER JOIN cuteness_showdown ON
  (cat.id = cuteness_showdown.cat_1 OR cat.id = cuteness_showdown.cat_2);

2) Query each column separately and UNION ALL:

SELECT
  DISTINCT (table_1.id) as 'Cat ID:'
FROM
  (SELECT
    cuteness_showdown.cat_1 AS id
  FROM
    cuteness_showdown
  UNION ALL
  SELECT
    cuteness_showdown.cat_2 AS id
    FROM
    cuteness_showdown) AS table_1;

Now, which version is faster / best practice if I need a column from another table?

1) INNER JOIN with two OR conditions (no change):

SELECT
  DISTINCT (cat.id) as 'Cat ID:',
  cat.name as 'Cat Name:'
FROM
  cat
INNER JOIN cuteness_showdown ON
  (cat.id = cuteness_showdown.cat_1 OR cat.id = cuteness_showdown.cat_2);

2) Query each column separately and UNION ALL (needed to INNER JOIN cat table):

SELECT
  DISTINCT (table_1.id) as 'Cat ID:'
  cat.name as 'Cat Name:'
FROM
  (SELECT
    cuteness_showdown.cat_1 AS id
  FROM
    cuteness_showdown
  UNION ALL
  SELECT
    cuteness_showdown.cat_2 AS id
  FROM
    cuteness_showdown) AS table_1
INNER JOIN cat on
  (table_1.id = cat.id);

Best Answer

To find out which is faster, break out a terminal, write a script that runs each 1000 times and compare the results :)

As for whether they are equivalent, the query optimiser will very often come up with the exact same execution plan for several SQL queries that do the same thing, so they may well be. I can't tell you whether these ones will get that treatment, but you can use EXPLAIN to see the execution plans for yourself and compare them, assuming you have some data.

If the execution plans are indeed the same, best practice is about choosing the more readable statement so that anyone else who comes along to maintain the code can do so easily. Alternatively, if they are not the same, then you have to decide whether a harder-to read statement is worth the extra performance gain, which depends on how big a deal performance is in your project. I'd argue that if you have a relatively small DB which is unlikely to scale much and sub 10ms response times, then performance isn't an issue so just make it easy to maintain.