MySQL Shell 8.0.21 comes with two utilities which can be used to perform logical dumps of all the schemas from an instance (
util.dumpInstance()) or selected schemas (
util.dumpSchemas()). These tools offer major performance improvements over the
mysqldump utility, some of our benchmarks show a throughput up to 3GB/s! Let’s see how this is achieved.
This is part 4 of the blog post series about MySQL Shell Dump & Load:
- MySQL Shell Dump & Load part 1: Demo!
- MySQL Shell Dump & Load part 2: Benchmarks
- MySQL Shell Dump & Load part 3: Load Dump
The dump process uses multiple threads to perform its tasks (the number of threads can be specified using the
threads option), allowing for parallel execution of time-consuming operations. Each thread opens its own connection to the target server and can be working on dumping data, dumping DDL or splitting the table data into chunks.
consistent option is set to
true (which is the default value), the dump is going to be consistent (as long as the dumped tables use the InnoDB engine). When the dump process is started, a global read lock is set by the global Shell session using the
FLUSH TABLES WITH READ LOCK statement. Next, all threads establish connections with the server and start transactions using:
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ
START TRANSACTION WITH CONSISTENT SNAPSHOT
Once all the threads start transactions, the instance is locked for backup and the global read lock is released.
It’s usually the case that the size of tables in a schema varies a lot, there are a few very large tables and many smaller ones. Adding more threads will not speed up the dumping process in such a case. In order to overcome this problem, the data in tables can be divided into smaller chunks, each of which will be dumped to a separate file by one of the threads.
Our chunking algorithm strives to find the primary key (or unique index) values that partition the table rows into approximately equal size chunks. It uses
EXPLAIN statements to determine the chunk sizes, acquiring row count estimates from the optimizer, as opposed to e.g.
SELECT COUNT(*), which would need to scan rows one by one. These estimates are often not very accurate, but it’s significantly faster and good enough for our purposes.
The chunking is turned on by default and will result in files containing roughly 32MB of uncompressed data. These features can be set up using the
bytesPerChunk options respectively, it’s usually a good idea to increase the latter if your data set is very large.
mydumper utilities write the data as a series of
INSERT statements, while the new Shell utilities use the default format expected by the
LOAD DATA statement. This results in output files that are smaller on average, meaning that they can be produced more quickly. As an additional benefit, loading such files is much faster.
By default, all data files are compressed using the
zstd algorithm, which offers a good balance between compression ratio and encoding performance. The other supported algorithm is
gzip. Compression can also be disabled by setting the
compression option to
The code of the dumper has been heavily benchmarked. In the course of this process we identified and fixed various issues throughout the Shell’s code base, like unnecessary data copies, redundant function calls, superfluous virtual calls. The “hot” function, the one that writes data to the output files, has been carefully analyzed and optimized.
Here’s a comparison of performance of the two new utilities provided by the Shell and
For more benchmark results, please see MySQL Shell Dump & Load part 2: Benchmarks
util.dumpSchemas() utilities provide some major performance boost over the existing tools, but there’s still some room for improvement. In particular, our chunking algorithm could certainly be optimized and this is something that we plan to tackle in the future. Stay tuned!