In a recent blog post my colleague Roy presented his work for MySQL 5.7 that makes the query optimizer merge sub-queries in the FROM clause (so-called derived tables) with the outer query. I will in this blog post show an example of how this work improves the performance of a MySQL query.
Using the schema from the DBT-3 database, we will look at the following query containing two derived tables:
SELECT t2.o_clerk, t1.price - t2.o_totalprice
(SELECT l_orderkey, SUM( l_extendedprice * (1 - l_discount)) price
FROM lineitem GROUP by l_orderkey) t1
(SELECT o_clerk, o_orderkey, o_totalprice
WHERE o_orderdate BETWEEN '1995-01-01' AND '1995-12-31') t2
ON t1.l_orderkey = t2.o_orderkey WHERE t1.price > t2.o_totalprice;
Prior to 5.7, MySQL would have to execute each subquery individually and store the results in two temporary tables that are then joined together:
The above diagram was generated by the Visual EXPLAIN feature in MySQL Workbench. Each subquery is within a box that represent a temporary table. The Visual EXPLAIN diagram shows that the temporary table for the derived table t2 will be scanned, and for each row, an index lookup into the temporary table for t1 will be made. This can also be seen from the traditional tabular EXPLAIN output for this query:
|1||PRIMARY||<derived2>||ref||<auto_key0>||<auto_key0>||4||t2.o_orderkey||599860||Using where; Using index|
|2||DERIVED||lineitem||index||PRIMARY, i_l_shipdate, …||PRIMARY||8||NULL||59986052||NULL|
EXPLAIN shows that there are three query blocks. The main/outer query with ID 1 and the two subqueries with IDs 2 and 3. We see that the outer query will access the tables <derived2> and <derived3>, which are the temporary tables generated by queries 2 and 3, respectively. We also see that an index, <auto_key0>, is created on <derived2> and will be used to find matches for o_orderkey. (Note that the ability to create indexes on derived tables was introduced in MySQL 5.6. Running this query will take several months with MySQL 5.5 since without an index, MySQL would have to do a full scan of one of the temporary tables for every row of the other temporary table.)
In MySQL 5.7 the optimizer will try to merge derived tables into the outer query block. However, this is not possible for all subqueries. Visual EXPLAIN shows that only one of the subqueries are actually merged:
The subquery for the derived table t1 can not be merged because it has a GROUP BY clause. Hence, MySQL 5.7 will materialize the result of this subquery, scan the resulting temporary table, and do primary-keys looks-up directly on the orders table. The tabular EXPLAIN output also shows that one of the subqueries have been merged into the outer query block:
|1||PRIMARY||orders||NULL||eq_ref||PRIMARY, i_o_orderdate||PRIMARY||4||t1.l_orderkey||1||10.69||Using where|
|2||DERIVED||lineitem||NULL||index||PRIMARY, i_l_shipdate, …||PRIMARY||8||NULL||59986052||100.00||NULL|
As you can see, query block #3 has been merged into query block #1.
The result of running this query on a DBT-3 scale factor 10 database shows that MySQL 5.7 cuts the execution time in half compared to MySQL 5.6:
The above query example was part of my tutorial “How to Analyze and Tune MySQL Queries for Better Performance” at Oracle OpenWorld 2015. Slides for the entire presentation can be found here.