MySQL 8.0: Retiring Support for the Query Cache

As Rene wrote on the ProxySQL blog yesterday:

Although MySQL Query Cache was meant to improve performance, it has serious scalability issues and it can easily become a severe bottleneck.

This is indeed something we have observed in the MySQL team for a while. Before we get to the subject of today’s post, let me start with an introduction.

Introduction to Query Cache

The MySQL query cache is a query results cache. It compares incoming queries that start with SEL to a hash table, and if there is a match returns the results from the previous execution of the query. There are some restrictions:

  • The query must match byte-for-byte (the query cache avoids parsing)
  • Use of non-deterministic features will result in the query not being cached (including temporary tables, user variables, RAND(), NOW() and UDFs.)
  • The query cache was designed to not serve stale results.  Any modification to the underlying table(s) results in all cache being invalidated for those tables.
  • There are some restrictions on if the cache can be used for InnoDB (to respect MVCC; as you have a transaction open, the ‘cache’ might not represent the data in your expected view.)

The Best Case Scenario

As I wrote on my personal blog some years ago:

The ideal scenario for the query cache tends to be largely read-only, where there are a number of very expensive queries which examine millions of rows only to return a few. A hypothetical example might be a complex query to build a list of values for a drop-down list that always appears on a webpage form. In a situation like this, the query cache can mask performance problems caused by missing indexes, which makes it helpful for novice users.

This comment still yields true today, but I think it is important to also point out that DBA tools for poor-application-intervention have also improved:

  • In the MySQL server, we now have the ability to rewrite queries to insert hints (or other modifications to improve performance)
  • We have third-party tools like ProxySQL, which can act as a man-in-the-middle query cache. ProxySQL also supports a TTL for cache, which works fine in the example I provided earlier (build a list of values for a drop-down list).

Limitations of the Query Cache

The query cache has been disabled-by-default since MySQL 5.6 (2013) as it is known to not scale with high-throughput workloads on multi-core machines. Rene confirmed this in his post yesterday, but it has also previously been mentioned by Stewart SmithDomas Mituzas (update: and Kristian Koehntopp).

Assuming that scalability could be improved, the limiting factor of the query cache is that since only queries that hit the cache will see improvement; it is unlikely to improve predictability of performance.  For user facing systems, reducing the variability of performance is often more important than improving peak throughput:

Slide 2/47 from A Top-Down Approach to Achieving Performance Predictability in Database Systems.
Slide 3/47 from A Top-Down Approach to Achieving Performance Predictability in Database Systems.

Decision to Remove Support for the Query Cache

We concur with the research performed by Jiamin Huang, Barzan Mozafari, Grant Schoenebeck, Thomas F. Wenisch at the University of Michigan, Ann Arbor.  We considered what improvements we could make to query cache versus optimizations that we could make which provide improvements to all workloads.

While these choices themselves are orthogonal, engineering resources are finite.  That is to say that we are shifting strategy to invest in improvements that are more generally applicable to all workloads.

We also agree with Rene’s conclusion, that caching provides the greatest benefit when it is moved closer to the client:

The “Client + 2x ProxySQL” results showing a 5.2x performance improvement when moving the cache to the client.

Upgrade Path for Existing Users

With the current limitations noted, the query cache will continue to be supported for the life time of MySQL 5.7.  MySQL 8.0 will not support query cache, and users upgrading will be encouraged to use either Server-side Query Rewrite or ProxySQL as a man-in-the-middle cache.

We expect this change to only affect a small number of users, but if this concerns you, please reach out and get in touch!

Thank you for using MySQL!

Update: This post is being discussed on HackerNews

38 thoughts on “MySQL 8.0: Retiring Support for the Query Cache

  1. Morgan,

    This is good news, and I am very happy to see ProxySQL mentioned as one option for QC users when they upgrade.

    I do have a question on your choice of wording here. You always use the word support. Does this mean one of:
    – 8.0 will have the code that implements the query cache, but as current GA versions, it will be disabled by default, and Oracle won’t provide support for users that enable it
    – Same as the previous one, but the variables controlling the query cache will be disabled, making it impossible to use it
    – The code will be removed and 8.0 will be a completely query cache free version

    If it means none of the above, can you please explain to me what you mean by not supported?


    1. Hi Fernando,

      It means that time permitting we will remove all code right down to the #ifdefs (as we have for embedded server for example).

      From a user perspective, that is a more straight forward question – we do plan to remove the various options.

  2. In our experience, caching challenges consist of 1) Cache invalidations and 2) Network latency. ProxySQL solves the latency issue as MySQL caching is on the application server so that DB doesn’t even see the queries via client-side caching. I like Rene’s solution.
    Heimdall Data offers a similar solution. However, it auto-caches, auto-invalidates, and clusters cache nodes into a single view.
    Both ProxySQL and Heimdall Data are good alternative solutions to the MySQL query cache. Both are cache solutions are also free for download.

      1. Bit late now, but perhaps look at ConcurrencyKit? that has a lock-free hash. They scale extremely well.

        Lock-free balanced binary trees exist in the literature now, too, although I’ve yet to find an open source implementation. They scale extremely well too, as the inherent nature of the binary tree disperses memory accesses. The TLB misses of course are painful, but infinitely less painful than locking. I see lock-free binary trees being two orders of magnitude faster than locking binary trees on 16 cores. They’re twice as fast even on two cores, and slightly faster on just one.

        1. The other things which is nice about lock-free is you get a much better distribution of work-load across cores.

          Locking mechanisms can be quite skewed, favouring one or a couple of particular cores.

        2. Improving the scalability of the hash is one of the paths we evaluated (removing the global lock; reducing fragmentation issues).

          Our ultimate path here was to take more of a longer term view – prioritize optimizations that are more generic to all workloads.

    1. Hi Jocelyn,

      The benchmark I point to here was from Rene’s post, but yes – I have seen higher throughput with lower values.

      However, the point about variance still stands. It is rare for a 99th percentile to improve with query cache. Performance is about improving the quality of consistency 🙂

      A non-p99 improvement is like going to a restaurant and we order the same dish, but mine is spectacular; and yours is way over salted.

  3. Apples to Pears comparisons when comparing server-side caching to client-side caching:

    – if all clients execute a common set of queries, having cache sit next to the client means wasting memory, as the same result set will be cached N times instead of 1

    – the main difference between server-side caching and client-side caching is *transparency*. I don’t give a ### if the cache in the client (or in a proxy) gives me better perfs, as it is not transparent to the application. Since it only has ttl-based invalidation, it can only be used by the application if the app is designed to tolerate stale/inconsistent results. But if I have to design my app with a caching layer in mind, I might then just use memcached or in-memory caches, and eschew the db result set cache – even more efficient! The great thing about mysql query cache is that it does do automatic cache invalidation which is transparent to the application (I have never met a scenario where it messed up because of transactions).

    1. server-side caching and client-side caching are indeed two different solutions, with their pros and cons, not necessarily one better than the other. Truly Apples to Pears comparisons.

      Caching client side doesn’t necessarily mean “wasting” more memory than server side. Unless you have only one database server (no reads from replicas?), also with server side caching you will likely have the same result cached N times instead of 1.
      “wasting memory” client side may become “using memory” if the client has unused memory: many applications are CPU intensive, and not memory intensive, so it is not unusual to have unused RAM there.

      But I think it all goes down to be able to move the caching layer as it fits better. The caching layer should be between the client(s) and the server(s), and you should be able to dynamically move it closer to the clients or to the servers. The caching layer itself can be sharded (some results cached in a shard, while other results cached in another shard), or even multi-layer.

      It is unlikely that all your clients and servers are deployed in the same rack. If they are distributed across multiple AZ or regions, network latency can easily add 70-100ms to every single request: would you prefer a cache closer to the server (hundreds ms), or closer to the client (hundreds microseconds on the same network, or few microseconds if on the same box)?
      You should be able to choice where your caching is.

      ProxySQL caching is indeed transparent to the application: the application won’t be aware that a caching layer exists. On the other hand, memcached is not transparent to the application: application needs to be aware of it.

      Bonus note: speaking about “wasting”, I know of whom, thanks to ProxySQL’s QC is going to remove “LOT OF replicas that I won’t need anymore” (quote). That will result in a lot of saving.

    2. For Web scaling, we have found client-side caching to be very beneficial as it stores the hot objects in local memory. App server memory utilization has been low at 50mb-250mb. Less frequently used content can be stored in a grid-cache (i.e. Redis, Memcache) as a tier 2 cache. The goal is to offload MySQL processing so that you can increase DB scale and improve application response times.
      People often neglect that up to 20% of application latency is at the backend, which is controllable. Caching in the application servers will increase response times if configured correctly. If the query hits the DB to be cached, you did not avoid the round trip time.

  4. Your comments about caching I believe are very accurate–people don’t in general want to risk data inconsistency, and ttl based invalidation really limits the use cases where caching can be used. Roland had mentioned the Heimdall Data solution, which works to cache on the client side, but behaves much like the QC in MySQL today, in that it detects changes to tables at the sql level, and invalidates based on this. It is also transaction aware, and while there are corner cases where in transactions things can get sticky, you can disable caching when a transaction is in process if these are issues in the application. The net result in our testing is that most common applications can get at least a 70% cache hit rate, and often higher.

  5. You mention that the query cache was disabled starting MySQL 5.6. But then you say it’s still supported in MySQL 5.7.

    So how do we enable it in 5.7?
    Of course, I tried with the query cache settings set to default values (1 MB & 16 MB) and with 0, but no difference.
    From what I measured in the benchmark for django-cachalot (a query cache for Django), the MySQL query cache was totally ineffective when switching from 5.5 to 5.7.

    Here are the benchmark results:
    The benchmark code itself is in the django-cachalot source if you want to see how it’s written.

  6. I think it’s great to remove this feature in 8.0, but FWIW it comes with some pain when you remove a feature suddenly without going through at least one major version with it deprecated.

    JetBrains DataGrip is sort of an IDE for databases, and it apparently twiddles with variables like query_cache_type as it runs queries.

    I’ve logged a bug with JetBrains:

  7. I understand the reasoning behind this change; however, I disagree with how it was removed. MySQL devs should have at least given users an option to implement their own “semi-intelligent” query cache of their own – which starts with at least knowing if a table has been recently changed. This is not even reliably possible in mysql 8 as INFORMATION_SCHEMA.TABLES.UPDATE_TIME is not reliably kept updated.

    Therefore, the only fallback is a dumb TTL for cached data (what proxysql uses).

    At least give us something to go off of to implement alternative solutions for a feature which has been part of MySQL for many years.

    1. MySQL 8.0 comes with several improvements to INFORMATION_SCHEMA. It enables dynamic table statistics to be cached, which is enabled by default. The session variable “information_schema_stats_expiry” can be used to control the behavior. Setting this to ‘0’ seconds should help see latest value of “INFORMATION_SCHEMA.TABLES.UPDATE_TIME”.
      Please see following documentation for more information.

Leave a Reply