Contention-Aware Transaction Scheduling Arriving in InnoDB to Boost Performance

Authors: Sunny Bains, Jiamin Huang (University of Michigan)

What is Transaction Scheduling?

Locking is one of the most popular mechanisms for concurrency control in most database systems, including Oracle MySQL. One major question, however, seems to have been overlooked by all database vendors:

Q: When multiple transactions are waiting for a lock on the same object, which one(s) should get the lock first?

Nearly all database systems, including previous versions of MySQL, rely on some variant of the First-In-First-Out (FIFO) policy. In a nutshell, FIFO grants locks to those who requested it first (i.e., transactions that are ahead of the queue, unless they’re incompatible with the locks that are currently granted). Because of its simplicity, most database vendor have settled for FIFO and its variants without considering other strategies.

Recently, a research group at the University of Michigan pointed out that behind this seemingly simple question lies an excellent opportunity for performance improvement. Professor Mozafari and his PhD students showed that different methods for lock scheduling have significant implications on the overall performance of a transactional database. They invented a new algorithm, Contention-Aware Transaction Scheduling (CATS), which can dramatically reduce latency and increase throughput when used in place of a FIFO approach.

The MySQL team at Oracle has been working closely with the inventors of CATS (Mozafari and his team), making MySQL the first database to integrate this new technology. Starting with MySQL 8.0.3, CATS will now be the default scheduling algorithm in InnoDB, which means all MySQL users will experience a dramatic boost in performance, especially for contentious workloads.

How Does CATS Work?

The CATS algorithm is based on a simple intuition: not all transactions are equal, and not all objects are equal. When a transaction already has a lock on many popular objects, it should get priority when it requests a new lock. In other words, unblocking such a transaction will indirectly contribute to unblocking many more transactions in the system, which means higher throughput and lower latency overall.

Here’s an analogy: if a cab driver and a bus driver are waiting in a line for coffee, making coffee for the bus driver first (even if s/he arrived after the cab driver) will allow more people to arrive at their destination sooner (since there are more passengers on the bus than in the cab). While this might seem unfair for the cab driver, because of the complex inter-dependencies, this strategy would make the entire system so much faster that everyone (even the cab driver!) would benefit from a more efficient service.

Of course, we’re dealing with locks and transactions and not busy transportation drivers. Let’s use a toy example to illustrate how CATS works in the world of databases. We know that transactions must acquire locks on every data object before reading or updating it, depending on the isolation level. If there are incompatible locks already held on that object (say by other transactions), the current transaction will be blocked until those other locks are released. The transactions that already have a lock on this object may themselves get blocked on other objects (the deadlock detection/prevention mechanism will ensure that there is no loop here). Consider the following figure.

Transaction contention

In this situation, FIFO would decide by simply looking at which transaction requested O1 first. However, the CATS algorithm does something smarter: it counts the total number of transactions that are blocked by each transaction (directly or indirectly) and grants the lock to the one that is blocking more transactions. In this case, there are four transactions blocked by t1 and three transactions blocked by t2. So, CATS would grant the lock to t1 first, which will enable more transactions (four, rather than three) to be unblocked sooner, and thus allow for greater and faster progress in the system.

For shared locks, CATS will also try to grant as many locks as possible. Here, the difference between FIFO and CATS is that FIFO stops granting locks when it encounters an exclusive lock according to their time of arrival, whereas CATS sorts transactions in the descending order of the number of transactions that they block.

Performance Improvement

Dimitri Kravtchuk from the Oracle tram has tested this new algorithm extensively against the Sysbench workload under the “OLTP_RW-pareto” scenario on both high-end and low-end hardware. Results show that, when there is contention in the system, CATS delivers significantly higher performance than FIFO in terms of transactions per second (TPS), mean latency, and 95th percentile latency (see below). Interestingly, even when there is no contention, CATS can still deliver the same performance as FIFO. This is because when there is no contention, there is no scheduling decision to be made (i.e., there is at most one transaction waiting for each lock). In other words, by using CATS instead of FIFO, you won’t lose anything, but stand to gain a lot as soon as there is contention in the system.

CATS vs. FIFO in TPS, mean latency and 95th percentile (up to 5.05x improvement)

 

# of users TPS Mean Latency

(ms)

95th Percentile

(ms)

FIFO CATS Improvement FIFO CATS Improvement FIFO CATS Improvement
32 13187 13127 1.00x 2.42 2.43 1.00x 2.82 2.79 1.01x
64 20109 20155 1.00x 3.18 3.18 1.00x 5.95 5.75 1.03x
128 11705 19741 1.69x 10.93 6.49 1.68x 42.53 18.78 2.26x
256 6237 18770 3.01x 41.01 13.63 3.01x 211.44 80.83 2.62x
512 2936 14836 5.05x 174.07 34.49 5.05x 1007.32 214.89 4.69x

Conclusion

MySQL is one of the first databases in the world to adopt this state-of-the-art transaction scheduling technology, Contention-Aware Transaction Scheduling (CATS). This algorithm solves the issue of rapid performance degradation that many face when the load on their database system increases, which is one of the main problems MySQL 8.0 aims to resolve.

CATS (or VATS as it was previously called) kicks in once the number of waiting transactions exceeds 32. There is no configuration option. The value of 32 was chosen after empirical testing.

 

About Jiamin Huang – Jiamin Huang is a Ph.D. student from the database group of University of Michigan. His research focuses on improving the performance of database systems. He’s under the supervision of professor Barzan Mozafari.

10 thoughts on “Contention-Aware Transaction Scheduling Arriving in InnoDB to Boost Performance

  1. Hi,

    Do you handle starvation problem ? In your example with Bus and the Car – while getting the bus first makes sense if there is always queue with couple of Buses you never get to the Car. Or is this resolved naturally in this case

    I see numbers for Sysbench but this is obviously pretty trivial workload

    1. This is a great questions. Thanks for asking it!
      The bus/cab analogy is intentionally over-simplified. In reality, you cannot have an infinite stream of buses arrive, because buses do not “arrive”: they all start from being a cab and convert into a bus over time as they acquire more locks! In other words, everyone starts from being a cab. A more precise analogy would be that all you have are bus drivers but some buses are more full than others. Every bus is initially empty of passengers (i.e., locks)…

      But in theory, starvations are possible under some really rare settings. This is why we also have a few mechanisms in CATS (e.g., keeping track of transactions with long lock wait times and boosting their priority, or putting a barrier in the lock queue from time to time and ensuring that all locks before the barrier are granted before those after the barrier are considered). Some of these mechanisms will be introduced in the upcoming releases of MySQL.

      Regarding sys-bench being simple, we have conducted extensive experiments on several other benchmarks as well. Please see these papers:

      http://web.eecs.umich.edu/~mozafari/php/data/uploads/sigmod_2017_predictability.pdf

      http://web.eecs.umich.edu/~mozafari/php/data/uploads/lock-schd-report.pdf

  2. Congratulation on the great article Sunny ! an excellent read.

    Few observation as I went through the article,

    A) Contention of transactions seems to increase only after 64 clients, and as it appears the magic number of contention required for CATS to be active is 32 possibly the reason the benefits are realized only at 64 clients. With the CATS tunable below the current 32 it might present additional interesting data.

    B) CATS may possibly be biased towards complex transactions, resulting in simple transactional latencies being higher, if there were additional weights associated to the waiting simpler transactions it might make it more fair for MySQL DB systems that may be heavily loaded.

    –Thanks , Joy Das

    1. Hi Joy,

      You are absolutely right that the performance advantage of CATS increases with contention. However, technically speaking, we also see improvements under 32 clients. The results we had are as follows :

      # of waiting threads: VATS vs. FIFO (throughput)
      2: 4190 vs 4193
      4: 8299 vs 8311
      16: 13187 vs 13127
      32: 20109 vs 20155
      64: 11905 vs 19741

      Because the slight change of performance for 32 clients was most likely due to statistical noise/jitter.

  3. Sorry the numbers above are a bit off. Let me elaborate more on this. We actually observe improvement under 32 clients for CATS, but not on average latency. We did see a very small increase in average latency with 32 clients for CATS (2.42 to 2.43).

    All numbers below are FCFS vs. CATS

    # of waiting threads | throughput | 95th percentile
    8 | 4190 vs. 4193 | 2.42 vs. 2.41
    16 | 8299 vs. 8311 | 2.24 vs. 2.23
    32 | 13187 vs. 13127 | 2.82 vs. 2.79
    64 | 20109 vs. 20155 | 5.95 vs. 5.75

Leave a Reply

Your email address will not be published. Required fields are marked *

Please enter * Time limit is exhausted. Please reload CAPTCHA.