Plan to improve the out of the Box Experience in MySQL 8.0

In MySQL 8.0, we will be introducing a new configuration parameter called innodb_dedicated_server=bool. When ON, this option will look at the system memory, and then automatically set the these configuration parameters using the following rules:

When innodb_dedicated_server is OFF, the configuration will be as it is today. That is to say that it is not specifically a minimal configuration, but also not optimal for larger memory systems.

Background

The goal of innodb dedicated server is to improve the out-of-the-box experience when a user does not make any changes to configuration. As per our documentation, the current behavior is that:

The default configuration is designed to permit a MySQL server to start on a virtual machine that has approximately 512MB of RAM.

The 512M number is chosen to match the memory seen on entry-level cloud instances, which has not seen any increases over the years.

If (for example) a user starts MySQL on an instance with 64GB of RAM, the same small footprint will be used. This will result in a very tiny innodb-buffer-pool-size of 128M, when the recommended advice is to use 50-75% of system memory.

Which leads us to the question, could we target the resource consumption to match the host?

Problem #1: Deciding when to Target

The first issue to mention with targeting resource consumption to match the host, is that this is not always desired. Consider the use cases of running MySQL on your desktop, or running it on a server alongside a webserver.

In these cases it is not a safe assumption to set the buffer pool to 50-75% of system memory.

Problem #2: Deciding how to Target

If we have established cases where it is not safe to use 50-75% of system memory, maybe you could then set a percentage of memory that MySQL can safely use?

We thought about this, and it has two problems:

  1. If you say “MySQL can use 60%”, then there would have to be some math to determine the buffer pool.  For example:

    Our concern is that taking 60% of 75% might get confusing (KISS principle).
  2. By the time you are asking a user to chose a number between 1% and 100%, you are already asking for configuration options. This quickly starts to defeat the purpose of this feature, which is to decide what to do in the case a user has not specified any configuration.

So the most important decision is only whether or not this server is intended to be a dedicated MySQL Server. For the other cases, it is reasonable to expect a user to adjust the configuration settings for their specific circumstances.

Problem #3: Asking an Installer Question

So we’ve arrived at this option being a boolean (is it dedicated or not?), but we did’t really answer the question of whether or not it is appropriate (i.e. do other programs have to run on this instance). On Mac OS X and Windows, we can now ask if this is a dedicated MySQL Server or not in the installers:

Would you like MySQL to use all available memory on this system? [ Yes / No ]
Selecting ‘Yes’ will improve overall MySQL performance. You may change this decision at any point by editing the innodb_dedicated_server option in your MySQL configuration file.

The problem is that on Linux systems, we can not guarantee that our RPM and DEB packages are able to ask questions. Which gets us to the next problem.

Problem #4: Selecting a Default

Because we can not reliably ask questions, we need to have a default to fallback to when the user has not specified anything. Since this default will be selected when the user has not specified any configuration, it quickly puts you in a chicken-and-egg situation:

i.e. if the user had to set that the server should be dedicated, they’ve made changes to configuration. At this point couldn’t they configure the buffer pool, log file and flush method anyway?

That is to say that the true value delivered by this feature requires it to default ON. When OFF, it only achieves a level of indirection in making the question slightly easier to answer. The out-of-the-box experience is not actually improved.

Side note: Another way this problem could be addressed is by having different defaults for different operating systems, or by having a ‘mysql-production’ (innodb-dedicated-server=ON) and a ‘mysql-development’ (innodb-dedicated-server=OFF) package.  This creates a new set of problems long enough for its own blog post :-).  The short answer is that we would prefer not to do this.

Status Report on MySQL 8.0.3

In MySQL 8.0.3, the feature innodb-dedicated-server will ship as described, but the default value will be set to OFF (for now). This is quite a big change, and there are some corner cases which we need to figure out the best way of moving forward. For example:

    • If an installer (which can’t ask questions) auto-starts MySQL after install, it could result in an OOM situation which prevents MySQL from starting. For the same ease-of-use motivation, we don’t want to expect a user to tail a log file and have to resolve this issue themselves.
    • Defaulting to 2048MB for innodb-log-file-size is a >20 times increase over MySQL 5.7. We do not believe we can reliably detect free space on a partition in a cross platform way, and incorporate this into our calculation.
    • The dedicated-server value for innodb-log-file-size is slightly arbitrary. It does not not specifically relate to the amount of memory on the system, or the size of the buffer pool. For example: a workload that keeps updating the same pages may need a larger redo log than a workload with a bigger buffer pool. Maybe we can reduce the risk of out of space issues by changing the max value to 512M?
    • Maybe we can find a middle ground for innodb-buffer-pool-size too, where instead of taking up to 75% of system memory, it takes up to 50%? This makes it somewhat more cooperative with other processes, while still measurably better than the default. This feature was never intended as a complete replacement for configuration, and so being conservative with values is fine.

Seeking Feedback

Our intention is to change the default to innodb-dedicated-server=ON before the GA release of MySQL 8.0. We’ve not gotten there yet; as we believe there is an opportunity to collect feedback, and improve upon how we resolve some of the known corner cases.

While for many of our advanced users, this feature may not have much impact (configuration settings can be changed!), I remind you to please consider the goal is to narrowly focused on out of the box experience.

Please let us know your thoughts, and thank you for helping improve MySQL!

31 thoughts on “Plan to improve the out of the Box Experience in MySQL 8.0

  1. Short response: “Yes!”

    Long response:

    I assume there will always be the possibility for explicit values in my.cnf to override the proposed formulas. This should be the case whether innodb_dedicated_server_pool is TRUE or FALSE. Turning off the boolean will revert to 128MB? Bad idea. Many DBAs will want control. Turning off should revert to an optional setting for innodb_buffer_pool_size, which defaults to 128MB or whatever.

    The formulas seem reasonably correct. Are they in my.cnf? Or baked into the code?

    innodb_buffer_pool_size…

    In the dim, dark, past, one of the main problems with InnoDB was the tiny default setting for the buffer_pool, together with the lack of any clue to the naive user that he should change it. Increasing the default shutdown the flood of Forum questions that were resolved mostly by changing the size.

    For a long time, my Rule of Thumb has been “70% of ‘available’ memory”. Lately, the upsurge of tiny cloud VMs has made that too simplistic. I have pondered (server-memory – 1G) * 75%, but that fails for really small VMs.

    Increasing the buffer_pool_size does not help after it is somewhat bigger than all the data & indexes. (This is to contradict the statement “will improve…”.)

    innodb_log_file_size…

    In looking over machines I have analyzed, 512MB seems to be “good enough” for nearly all. Anyone who has a machine that needs a bigger setting is already heavy into doing his own tuning, and will override the new forumla.

    Meanwhile…

    If SSD, turn off innodb_flush_neighbors. And change some “cost” constants for the Optimizer.

    If RAID (of certain types), turn off doublewrite (and maybe some other things).

    If Clustering (of certain types), turn off doublewrite and other things. Instead, plan on failing over.

    1. Hi Rick,

      > I assume there will always be the possibility for explicit values in my.cnf to override the proposed formulas. This should be the case whether innodb_dedicated_server_pool is TRUE or FALSE.

      That is correct. Explicit values take precedence.

      > Turning off the boolean will revert to 128MB? Bad idea. Many DBAs will want control. Turning off should revert to an optional setting for innodb_buffer_pool_size, which defaults to 128MB or whatever.

      I’m not sure I understand. Could you clarify here? It may relate to the next point (baked or not).

      > The formulas seem reasonably correct. Are they in my.cnf? Or baked into the code?

      None are baked. This is intentional to make it cloud and container friendly 🙂

      > In looking over machines I have analyzed, 512MB seems to be “good enough” for nearly all.

      Thank you for the feedback! And yes, if it is ‘good enough’ for most, then it fits in with the goal of this feature.

      > If SSD, turn off innodb_flush_neighbors. And change some “cost” constants for the Optimizer.

      We have a separate proposal for innodb_flush_neighbors, which is to turn it off for all installations (unrelated to dedicated-server), as the default assumption for 8.0 should be SSDs.

      The cost constants in 8.0 are improved to reflect cost of in memory vs on disk. I have examples on slides 56-60 here: https://www.slideshare.net/morgo/mysql-80-optimizer-guide/56

      > If RAID (of certain types), turn off doublewrite (and maybe some other things).
      > If Clustering (of certain types), turn off doublewrite and other things. Instead, plan on failing over.

      Our goal is to be safe by default or “opt-out”. We can detect if storage supports atomic writes and disable doublewrite, but it is unsafe for a general default 🙁

  2. I’m sorry, but I don’t like this feature at all. I understand the goal of improving the out-of-the-box experience for naive users, but I don’t think this solution will be successful at this goal.

    Trying to pre-tune a MySQL installation with some formula is a one-size-fits-all solution, and these kinds of solutions are unreliable. We can recall examples of other products that have tried to do this, but eventually removed their auto-tuning features.

    It’s not a good assumption that the buffer pool needs as much physical RAM as you can afford. You already know this, because you need the innodb_dedicated_server option. Rick mentioned the possibility that the dataset is already smaller than RAM. In this case, adding more RAM has little or no benefit.

    Many naive users mistakenly believe (after reading some blog) that increasing RAM allocation always increases performance. It’s difficult to explain to them why this is not true.

    Likewise innodb log file. We assume that bigger is better, because of benchmarks showing that heavy write traffic benefits from bigger log files, because of delaying checkpoints. But what if you don’t have heavy write traffic? What if you use MySQL for a blog or a CMS that is 99% reads? The large log file is unnecessary. Sizing it for an assumed workload or dataset size has a high chance of being the wrong choice for tuning.

    I understand the difficulty of asking users questions during installation. I recently did a project automating MySQL provisioning with apt. It was annoying having to figure out debconf to work around the installation prompts that do exist (btw, please document MySQL’s debconf variables!).

    There’s also the problem that even if you do prompt the user for information, they don’t know the answers to the questions. This is especially true of the naive users that you’re targeting with this feature.

    If the installer asks “Do you use MySQL on a dedicated server?” do they even know what this means? They might think “dedicated” is simply the opposite of shared hosting.

    If the installer asks “Do you want to use all available memory on this system?” you will be surprised at how many users think “memory” refers to disk space, not RAM.

    In short: (1) Using formulas to tune MySQL is error-prone. (2) Asking users to make choices without information is error-prone.

    I have an alternative suggestion: Make it easier for users to become less naive about their choices.

    I think users need a kind of friendly cheat-sheet or infographic of how to make tuning decisions. This could include a list of questions about their data size and workload, and then a list of performance indicators to monitor and measure, like buffer pool page create rate, and log file write rate. Give tips on how to measure these things, what config options to change, and then how to measure again to verify that the change had the desired effect.

    A simple monitoring tool would also be useful. Nothing so sophisticated as PMM or VividCortex for long-term trending, but something more like pt-mext for quick, ephemeral measurements.

    The only thing the installation process needs to do is tell the user that tuning is a thing they need to do (many users don’t realize this), and refer them to the cheat-sheet documentation.

    1. I agree with most of what Bill says. However, I think that a formula is better than a constant that is too low for most users. That is, the formula is moving in the right direction, though not a panacea.

      As for a friendly cheat sheet, I have been saying for a few years “set the buffer pool to 70% of available RAM; ignore other tunables until you have some reason to worry.” The main flaw is the meaning of “available”.

      The waste in having too-big a log for the 99% read system is less troubling than the too-small log that recycles every minute. Not so much “one size fits all” as “one formula that keeps more people out of trouble”.

  3. Hi Morgan.

    I think I fall into Rick’s camp of not being completely convinced about the proposal as you state it. The idea to provide a better “easy” out of the box experience is good, and also to try to base that on real settings in the environment is also good.

    However, these settings strike me as being configuration_profiles, so I’d be tempted to make the setting name: configuration_profile = default (the value used at the moment).

    That makes it much easier to have another profile “innodb_dedicated” and if we are a bit mad or maybe thinking of other forks etc (yes it’s important to not forget them) then “rocksdb_dedicated” or “tokudb_dedicated”, though again depending on how smart your “pre-startup profile configurator” is having the profile called “dedicated” and default_storage_engine = “innodb” basically should be enough. If the user changes their default storage engine then the profiler could adjust the settings appropriately, or if it’s not then let the user do it.

    It’s also clear that you provide some ideas of these default settings. If I have access to the way that you provide these profiles and I can provide my own then I can choose on _my_ systems maybe to use a buffer pool size of 80% of RAM size and give the profile name “dedicated_80_pct” etc.

    The comment about the redo log files sizes is good, but I think the values you propose for most people are probably too large. My default settings are 2 x 512MB, but I do have servers which have larger or much larger (2 x 16GB) values. Ideally the server could see through some “metric” that it needs to increase the size (basically due to “heavy insert pressure”) and could do it on the fly (up to some dynamically configured maximum) but that could happen later.

    So the idea you bring up feels good, but the dedicated name doesn’t even if the intention behind it is reasonable. If you provide 2 profiles (default, innodb_dedicated) then make it possible for the DBA to potentially add his own extra profiles as that will simplify his task. I’m sure most people have puppet , chef or ansible doing this and it works but if the server helps us by providing the right hooks those who don’t have such tools will be able to do something similar and in most cases that will probably be enough.

    1. Hi Simon,

      Thank you for the alternative proposal! It has some nice properties that the existing proposal does not in that it can accommodate a middle ground ‘balanced’ configuration as well.

      In response to this point specifically:

      > The comment about the redo log files sizes is good, but I think the values you propose for most people are probably too large. My default settings are 2 x 512MB, but I do have servers which have larger or much larger (2 x 16GB) values. Ideally the server could see through some “metric” that it needs to increase the size (basically due to “heavy insert pressure”) and could do it on the fly (up to some dynamically configured maximum) but that could happen later.

      The redo log is not currently dynamic, which would prevent such an adjustment unfortunately.

      It is a noble goal post to aim for, but there will be other issues in ‘learning’. i.e. It could over-tune for a side workload such as a batch-load that occurs out of hours (and has plenty of time to run). The scope we are looking at for now is out-of-the-box experience.

  4. Hi Morgan. Ok, but still the size you propose is large and in many cases may be larger than the DB. Don’t over configure. For small DBs defaulting to 2048 M (x2 ?) means your disk foot print is going to be a minimum of 4GB. Of course that’s not huge but if you make a filesystem copy of the database rather than a logical backup then the extra overhead is really overkill. 512MB (x 2) works fine and I think for most people would be more than enough.

  5. One final thought.

    While you are looking at the server hardware pay attention to the threads that are configured for different tasks:
    * innodb read/write threads (hint: these are still not dynamic)
    * slave_parallel_workers (!)
    * purge threads etc.

    There are several different settings that can be used and while it’s not necessarily easy to figure out exactly the right settings here, if you want to configure stuff based on “memory size” it also seems wise to do a similar calculation based on “cpus”.