In MySQL 5.7, we have improved the scalability of DML oriented workloads in InnoDB. This is the result of a number of changes, which I will outline below.
- 1 (1) Fix index->lock contention
- 2 (2) Page cleaner thread optimizations
- 3 (3) log_sys->mutex optimization
- 4 (4) Avoiding the ‘read-on-write’ during transaction log writing
- 5 (5) Future improvements
(1) Fix index->lock contention
This RW lock protects all indexes, both the cluster and the secondary indexes.
Before 5.7, every modifications to non-leaf pages (every modifications for the tree structure) required to exclude the other threads’ access to the whole index by X-lock, and every concurrent accessing the index tree were blocked. This was the major reason of the index->lock contention in concurrent DML workloads.
In MySQL 5.7 concurrent access is now permitted to the non-leaf pages (internal nodes of the B+Tree) as long as they are not related to the concurrent tree structure modifications (WL#6326). This change reduces the major point of contention.
(2) Page cleaner thread optimizations
In MySQL 5.6, we introduced a dedicated page cleaner thread to handle background operations including flushing dirty pages from the buffer pool to storage and keeping number of free pages. By separating this task to its own thread, user threads are freed from doing this additional work. This has improved the CPU cost and should solve some cases of CPU bound problems. However, there still existed a scenario where in some DML oriented workloads there were too many tasks for a single page cleaner thread to keep up with. This could result in a reduction in performance as user threads were required to flush and keep sufficient pages free.
In MySQL 5.7, there have been two improvements in this area:
- The buffer pool list scans (e.g. flush_list, LRU) for flushing have been optimized and reduced in cost (WL#7047). This also improves the user threads’ flush/evict page operation (to obtain free page), which is necessary in the scenario that the page cleaner thread is too far behind. This change lowers the performance risk when the page cleaner is not able to perform enough work due to sub-optimal configuration settings.
- Multiple page cleaner threads are now supported, allowing these operations to occur in parallel. WL#6642.
(3) log_sys->mutex optimization
MySQL 5.7 reduces the impact of log_sys->mutex, which is held to control access to the log buffer and log writing. The impact of this change is most visible when innodb_flush_log_at_trx_commit=2, because the log writing without sync is not blocked waiting for a sync by the change.
(4) Avoiding the ‘read-on-write’ during transaction log writing
The InnoDB transaction log is written in block sizes of 512 bytes, which is often smaller than the block-size of the underlying device or file system. In the event that the transaction log is not memory-resident in an OS cache, a read may be required to be able to load the remainder of the underlying device’s block, write in place the InnoDB transaction log page, and then write out the underlying page. We refer to this problem as a read-on-write to save the contents of the transaction log which is not needed to save.
In MySQL 5.7 we address this problem by adding a new option of innodb_log_write_ahead_size. This allows the user to effectively pad write operations to complete the full block of the underlying device or file system, negating the need for a read-on-write modification. This change results in better stability of log throughput as there will no longer be a situation where some writes are effectively cached and others will not be cached.
We continue to investigate other ways of addressing this problem. For example, on an SSD, deallocation like FALLOC_FL_PUNCH_HOLE might be better if it is supported.
(5) Future improvements
We are continuing to focus on improving DML performance for 5.7. Some of our next areas of research include:
- Implementing improvements to the adaptive flushing algorithm (suggestion by Dimitri Kravtchuk)
- Setting a thread priority for the page_cleaner (in Linux for now)
- Addressing an issue where an overload of flushing can occur when the oldest modification reaches max_modified_age_sync. (lowers risk to reach max_modified_age_sync; proper throughput along with flushing around max_modified_age_sync)
- Introducing page fill factor to control frequency of merge/split of the index pages
As the result of the above improvements (including the future works), MySQL 5.7 has will respect configuration settings much closer and adjusting settings to reflect underlying hardware device(s) IO capabilities will be more important to optimize throughput. For example: settings that are too conservative may prevent the page cleaner thread from competing enough work.
innodb_io_capacity_max ≤ [actual max write pages/s]
As the result of the adjustments, 5.7 will always try to respect innodb_io_capacity_max for flush_list flushing. If the amount of outstanding work is too large, the page cleaner might spend too much time performing flush_list flushing and not complete some of the other tasks required of it. The actual maximum “write pages/s” can be confirmed by watching PAGES_WRITTEN_RATE value of INFORMATION_SCHEMA.INNODB_BUFFER_POOL_STATS, for example.
innodb_buf_pool_instances × innodb_lru_scan_depth ≥ [actual max read page/s]
The setting innodb_lru_scan_depth can now be considered as the target of free pages for each buffer pool instance at flushing operation of the page cleaner. A single round of page cleaner tasks is also intended to be completed within one second. So, “read page/s” is affected by innodb_buf_pool_instances × innodb_lru_scan_depth. Setting innodb_lru_scan_depth to a very high high value is not recommended, because the free page keeping batch might take too long. (* The actual maximum “read pages/s” can be confirmed by watching PAGES_READ_RATE value of INFORMATION_SCHEMA.INNODB_BUFFER_POOL_STATS, also for example.)