MySQL 5.7: Improved Performance of Queries with Derived Tables

In a recent blog post my colleague Roy presented his work for MySQL 5.7 that makes the query optimizer merge sub-queries in the  FROM clause (so-called derived tables) with the outer query.  I will in this blog post show an example of how this work improves the performance of a MySQL query.

Using the schema from the  DBT-3 database, we will look at the following query containing two derived tables:

Prior to 5.7, MySQL would have to execute each subquery individually and store the results in two temporary tables that are then joined together:

Query Plan in MySQL 5.6
Query Plan in MySQL 5.6

The above diagram was generated by the Visual EXPLAIN feature in MySQL Workbench. Each subquery is within a box that represent a temporary table. The Visual EXPLAIN diagram shows that the temporary table for the derived table t2 will be scanned, and for each row, an index lookup into the temporary table for t1 will be made. This can also be seen from the traditional tabular EXPLAIN output for this query:

id select_​type table type possible_​keys key key_​len ref rows Extra
1 PRIMARY <derived2> ref <auto_key0> <auto_key0> 4 t2.o_orderkey 599860 Using where; Using index
3 DERIVED orders ALL i_o_orderdate NULL NULL NULL 15000000 Using where
2 DERIVED lineitem index PRIMARY, i_l_shipdate, … PRIMARY 8 NULL 59986052 NULL

EXPLAIN shows that there are three query blocks. The main/outer query with ID 1 and the two subqueries with IDs 2 and 3. We see that the outer query will access the tables <derived2> and <derived3>, which are the temporary tables generated by queries 2 and 3, respectively. We also see that an index, <auto_key0>, is created on <derived2> and will be used to find matches for o_orderkey. (Note that the ability to create indexes on derived tables was introduced in MySQL 5.6. Running this query will take several months with MySQL 5.5 since without an index, MySQL would have to do a full scan of one of the temporary tables for every row of the other temporary table.)

In MySQL 5.7 the optimizer will try to merge derived tables into the outer query block. However, this is not possible for all subqueries. Visual EXPLAIN shows that only one of the subqueries are actually merged:

Query Plan in MySQL 5.7

The subquery for the derived table t1 can not be merged because it has a GROUP BY clause. Hence, MySQL 5.7 will materialize the result of this subquery, scan the resulting temporary table, and do primary-keys looks-up directly on the orders table. The tabular EXPLAIN output also shows that one of the subqueries have been merged into the outer query block:

id select_​type table part. type possible_​keys key key_​len ref rows filtered Extra
1 PRIMARY <derived2> NULL ALL NULL NULL NULL NULL 59986052 100.00 NULL
1 PRIMARY orders NULL eq_ref PRIMARY, i_o_orderdate PRIMARY 4 t1.l_orderkey 1 10.69 Using where
2 DERIVED lineitem NULL index PRIMARY, i_l_shipdate, … PRIMARY 8 NULL 59986052 100.00 NULL

As you can see, query block #3 has been merged into query block #1.

The result of running this query on a DBT-3 scale factor 10 database shows that MySQL 5.7 cuts the execution time in half compared to MySQL 5.6:

Query Execution Time

The above query example was part of my tutorial “How to Analyze and Tune MySQL Queries for Better Performance” at Oracle OpenWorld 2015. Slides for the entire presentation can be found here.

One thought on “MySQL 5.7: Improved Performance of Queries with Derived Tables

  1. Thanks, i upgraded to mysql 5.6 a few weeks ago, and i want to upgrade to 5.7 in 2 or 3 months more, this article give me hopes to do it.

Leave a Reply

Your email address will not be published. Required fields are marked *

Please enter * Time limit is exhausted. Please reload CAPTCHA.