In a previous blog post we presented some of the issues with the current optimizer cost model and listed several ideas for improvements. The new 5.7.5 DMR contains the result of our initial work on improving the optimizer’s cost model:
- Cost Model for WHERE Conditions. In previous versions of MySQL, the estimated number of rows from a table that will be joined with the next table only takes into account the conditions used by the access method. This often led to record estimates that were far too high and thus to a very wrong cost estimate for the join. With wrong cost estimates, the join optimizer might not find and choose the best join order. To solve this issue, a cost model that includes the entire query condition on the table when calculating the number of rows produced from a table has been implemented. This model estimates the filtering effect that the table’s conditions have on the number of records read from the table. For more details on how condition filtering works, see two earlier blog posts on this topic: part1, part2. With this feature, the join optimizer should in many cases be able to find a more optimal join order and produce a better query plan.
- More Accurate Index Statistics. When joining tables, index statistics are used for selecting which index to use. In MySQL the index statistics contain an estimate for the number of records that exist per key value within the index. Due to the previous use of integer values for this records per key estimate, the index statistics had a precision that was too low for indexes with high selectivity or cardinality. There had previously been no means to weigh the relative value of an index that has an average of 1.9 records per key value from another one that has 1.1 records per key value. This problem is now fixed by switching to the use of floating point values for the index statistics within the MySQL server. Since the index statistics are maintained by the storage engines, the more accurate index statistics must be supplied by the storage engine. In the 5.7.5 DMR, InnoDB has been changed to provide the more accurate index statistics using the new floating point format.
- Configurable Cost Constants. Previously, the basic cost values that the optimizer used when calculating cost estimates for queries were hard coded within the source code. Examples of such cost constants were the cost of reading a disk page, the cost of evaluating a query condition on a record, or the cost of comparing key values. The hard coded cost constants have now been replaced by user configurable costs. Two new tables—server_cost and engine_cost—have been added to the mysql system database. The cost values can now be changed by simply updating the corresponding rows in these two new tables.
We have also done some initial refactoring of the cost model code. The main focus here has been to replace the use of hard coded cost constants with function calls into the new cost model API. As part of this work we also introduced a new cost_estimate object that’s used for keeping track of cost estimates within the new cost model code.
If you are attending MySQL Central @ OpenWorld next week, I will present the MySQL Cost Model talk, where I will go into greater detail about all of the new cost model related features that are implemented in the new 5.7.5 DMR. I look forward to seeing you there!
We value your feedback! Please let us know if you encounter any bugs, or if you have more general feedback.
UPDATE: My OpenWorld presentation is now available here!