In some application scenarios (e.g. PHP applications) client connections have very short durations, maybe only executing a single query. This means that the time spent processing connects and disconnects can have a large impact on the overall performance.
In MySQL 5.6 we started working on optimizing the code handling connects and disconnects. And this work has accelerated in MySQL 5.7. In this blog post I will first show the results we have achieved and then describe what we have done to get them.
The graph below shows a comparison of the most recent 5.5 and 5.6 releases as well as the 5.7.2 and 5.7.3 milestones. We measured the number of queries per second (QPS) where each client executes a single query (point select) before disconnecting. For each server version we also tested with both Performance Schema disabled and enabled. Details about server hardware and configuration settings used, are given at the end of this blog post.
The results show that we have achieved a +220 % improvement compared to the current 5.5 version and a +110-150 % improvement compared to the current 5.6 version! We also have higher performance with Performance Schema on in 5.7.3, than we had with Performance Schema turned off in 5.7.2. Actually, the Performance Schema overhead for processing of connects/disconnects is lower in 5.7.3 than in both 5.5.35 and 5.6.15.
How we got them
Interestingly enough we started out by trying to fix a rare crashing bug on SPARC. Each connection is represented in the server by a THD object. These objects were organized in a global intrusive list (through inheritance) and removal from the list was done in THD’s destructor. In some rare cases and at high compiler optimization levels, the inheritance could corrupt the THD objects resulting in a crash. This crashing bug was fixed in MySQL 5.6.5.
While fixing it we realized that the destructor (because it removed itself from a global list) was called with a global mutex (LOCK_thread_count) held. Since THD is a pretty large object and the mutex is central to connects/disconnects, this was obviously not ideal. So in MySQL 5.6.6 we refactored the THD class so the destructor could be run without holding a global mutex. We also started the process of refactoring and improving the maintainability of the server code handling connects/disconnects.
Around the same time, we got a number of bugs reported related to connect/disconnects reported by Domas Mituzas from Facebook: Bug#62282, Bug#62283, Bug#62284, Bug#62285, Bug#62286. Some of them (62282 and 62285) were limited in scope, so they were implemented in 5.6.
In any case, for MySQL 5.6 our focus was on preparatory refactoring, not on performance improvements. But they did not go unnoticed, as evident by this blog post by Yoshinori Matsunobu, also from Facebook.
For MySQL 5.7.2, we made a worklog mostly based on 3 of Domas’ remaining bug reports (62283, 62284, 62288). Before this worklog, much of the initialization of new connections, including construction of the THD object, was done by the thread accepting new connections. By moving this work to the client thread, the acceptor thread was much sooner ready to accept a new connection. This worklog also included significant code refactoring – moving from C-like code to more modern C++ code.
After 5.7.2 was released, we asked our performance architect Dmitri Kravtchuk to investigate where the remaining connect/disconnect bottlenecks were. It turned out that the main bottleneck was a the LOCK_thread_count global mutex, that among other things protected the global list of connections. As a result of the refactorings we had earlier done in 5.6 and 5.7.2, it was quite easy to split this mutex so that it is no longer used for several different purposes. Now we have one mutex for the connection list, one mutex for the thread cache, use atomics for the thread ID counter, etc. Together with a minor improvement related to accounting of prepared statements, this gave us the performance improvements seen in 5.7.3.
In parallel with this work, Marc Alff has also been busy reducing performance schema overhead. Each time a client disconnects, many PFS statistics have to be maintained. Even if PFS was greatly extended in 5.6, and 5.7 so far has added e.g. memory, metadata lock and transaction instrumentation, the overhead in 5.7.3 is actually slightly lower than in 5.5!
Thanks to Vince Rezula and Avinash Potnuru for help with running performance benchmarks!
Tests executed on a server with Intel Xeon X7560 (4 sockets, 8 cores, 64 threads) @ 2.27 GHz, using the following cnf:
innodb_adaptive_flushing = 1
innodb_flush_method = O_DIRECT