Better Performance for JOINs Not Using Indexes

In some cases it is not possible to use an index to optimize a JOIN. This may for example happen when you query the Performance Schema. As a result these kind of queries can be very slow; however in MySQL 5.6 and later you can use a trick to improve the performance considerably.

As a working example in this post, I will use the schema_table_statistics view in the sys schema. Since the view involves the schema, I will create a reasonable large number of databases and tables for the test:

The current view definition for sys.schema_table_statistics is:

The issue is the LEFT JOIN on performance_schema.file_summary_by_instance. The Performance Schema tables do not have any indexes, so the query will need to examine the cross product:

To ensure that some activity has happened on all tables, a mysqldump backup is made:
shell$ mysqldump --all-databases > /dev/null

The original query takes 14 minutes and 24 seconds.

The trick to improve the performance is to rewrite the query to take advantage of the materialized temporary tables that MySQL 5.6 supports. To do this, change the LEFT JOIN on performance_schema.file_summary_by_instance to a LEFT JOIN on a subquery:

Now the output of EXPLAIN looks like:

The key here is the <auto_key0> index used for the LEFT JOIN between the pst table and the <derived2> table. This means the optimizer adds an ad-hoc index.

The performance difference is clear: the query now completes in around 20 seconds.

However the original query is a view definition and subqueries are not allowed in view definitions, so how to improve the view itself?

This can be done by creating a “helper view” instead of the subquery. The important thing to be aware of is that MySQL has two algorithms for creating a view:

  1. MERGE

MERGE will merge the corresponding parts of the view into the statement that refers to the view. So a view using the MERGE algorithm will not work as that will effectively take us back to the original view. However TEMPTABLE will materialize the view before using it with the rest of the statement, so that will do what we need.

So the view definition we need in this case is:

which then can be used in the view definition for schema_table_statistics:

Note: to take fully advantage of this rewrite, the extract_schema_from_file_name() and extract_table_from_file_name() functions needs to be changed from their current definitions (this has been done throughout the examples above). The current definition in the sys schema is that they return a varchar(512) – the same as the argument they take. However as the widest allowed schema and table name is a varchar(64), it is safe to change the return value to a varchar(64). The reason this is important is that the maximum width supported for a MyISAM index is 1000 bytes, so two columns each a UTF8 vachar(512) is too wide to allow for an auto key to be created.

Thanks a lot to Roy Lyseng and Mark Leith who is behind the ideas for this rewrite.

If you are interested to hear more about the Performance Schema, the MySQL sys schema, and performance tuning, then you should consider the following sessions at MySQL Central at Oracle OpenWorld:

4 thoughts on “Better Performance for JOINs Not Using Indexes

  1. I should have read this post a few days ago as this is an issue I have been having on some of my servers. mysql-sys is great but as you describe the performance of this specific view has been quite nasty on busy servers with a lot of tables.

    I came up with a different solution using the old views and that was to write a stored procedure which created a temporary table by selecting from each of the p_s tables, manually adding indexes to it after which the select was done with a join. That worked pretty well and had about the same performance as your more flexible solution so I had been about to roll that out on my servers.

    So it is certainly worth pointing out to people using mysql-sys to upgrade to the latest version (1.1.0) as the performance improvement is really significant for this specific view. I guess others can possibly be adjusted in a similar fashion.

    I also wonder if for the normal SELECT case MySQL could be made smarter and use this LEFT JOIN with sub query when it makes sense to do so as the provides such a significant performance increase that it would be really useful. Not having to make such manual changes in a query would be ideal.

    So thanks for the post and thanks for improving the mysql-sys queries: this change is dramatic and really helps.

Leave a Reply

Your email address will not be published. Required fields are marked *

Please enter * Time limit is exhausted. Please reload CAPTCHA.