Plan to improve the out of the Box Experience in MySQL 8.0

In MySQL 8.0, we will be introducing a new configuration parameter called innodb_dedicated_server=bool. When ON, this option will look at the system memory, and then automatically set the these configuration parameters using the following rules:

When innodb_dedicated_server is OFF, the configuration will be as it is today. That is to say that it is not specifically a minimal configuration, but also not optimal for larger memory systems.

Background

The goal of innodb dedicated server is to improve the out-of-the-box experience when a user does not make any changes to configuration. As per our documentation, the current behavior is that:

The default configuration is designed to permit a MySQL server to start on a virtual machine that has approximately 512MB of RAM.

The 512M number is chosen to match the memory seen on entry-level cloud instances, which has not seen any increases over the years.

If (for example) a user starts MySQL on an instance with 64GB of RAM, the same small footprint will be used. This will result in a very tiny innodb-buffer-pool-size of 128M, when the recommended advice is to use 50-75% of system memory.

Which leads us to the question, could we target the resource consumption to match the host?

Problem #1: Deciding when to Target

The first issue to mention with targeting resource consumption to match the host, is that this is not always desired. Consider the use cases of running MySQL on your desktop, or running it on a server alongside a webserver.

In these cases it is not a safe assumption to set the buffer pool to 50-75% of system memory.

Problem #2: Deciding how to Target

If we have established cases where it is not safe to use 50-75% of system memory, maybe you could then set a percentage of memory that MySQL can safely use?

We thought about this, and it has two problems:

  1. If you say “MySQL can use 60%”, then there would have to be some math to determine the buffer pool.  For example:

    Our concern is that taking 60% of 75% might get confusing (KISS principle).
  2. By the time you are asking a user to chose a number between 1% and 100%, you are already asking for configuration options. This quickly starts to defeat the purpose of this feature, which is to decide what to do in the case a user has not specified any configuration.

So the most important decision is only whether or not this server is intended to be a dedicated MySQL Server. For the other cases, it is reasonable to expect a user to adjust the configuration settings for their specific circumstances.

Problem #3: Asking an Installer Question

So we’ve arrived at this option being a boolean (is it dedicated or not?), but we did’t really answer the question of whether or not it is appropriate (i.e. do other programs have to run on this instance). On Mac OS X and Windows, we can now ask if this is a dedicated MySQL Server or not in the installers:

Would you like MySQL to use all available memory on this system? [ Yes / No ]
Selecting ‘Yes’ will improve overall MySQL performance. You may change this decision at any point by editing the innodb_dedicated_server option in your MySQL configuration file.

The problem is that on Linux systems, we can not guarantee that our RPM and DEB packages are able to ask questions. Which gets us to the next problem.

Problem #4: Selecting a Default

Because we can not reliably ask questions, we need to have a default to fallback to when the user has not specified anything. Since this default will be selected when the user has not specified any configuration, it quickly puts you in a chicken-and-egg situation:

i.e. if the user had to set that the server should be dedicated, they’ve made changes to configuration. At this point couldn’t they configure the buffer pool, log file and flush method anyway?

That is to say that the true value delivered by this feature requires it to default ON. When OFF, it only achieves a level of indirection in making the question slightly easier to answer. The out-of-the-box experience is not actually improved.

Side note: Another way this problem could be addressed is by having different defaults for different operating systems, or by having a ‘mysql-production’ (innodb-dedicated-server=ON) and a ‘mysql-development’ (innodb-dedicated-server=OFF) package.  This creates a new set of problems long enough for its own blog post :-).  The short answer is that we would prefer not to do this.

Status Report on MySQL 8.0.3

In MySQL 8.0.3, the feature innodb-dedicated-server will ship as described, but the default value will be set to OFF (for now). This is quite a big change, and there are some corner cases which we need to figure out the best way of moving forward. For example:

    • If an installer (which can’t ask questions) auto-starts MySQL after install, it could result in an OOM situation which prevents MySQL from starting. For the same ease-of-use motivation, we don’t want to expect a user to tail a log file and have to resolve this issue themselves.
    • Defaulting to 2048MB for innodb-log-file-size is a >20 times increase over MySQL 5.7. We do not believe we can reliably detect free space on a partition in a cross platform way, and incorporate this into our calculation.
    • The dedicated-server value for innodb-log-file-size is slightly arbitrary. It does not not specifically relate to the amount of memory on the system, or the size of the buffer pool. For example: a workload that keeps updating the same pages may need a larger redo log than a workload with a bigger buffer pool. Maybe we can reduce the risk of out of space issues by changing the max value to 512M?
    • Maybe we can find a middle ground for innodb-buffer-pool-size too, where instead of taking up to 75% of system memory, it takes up to 50%? This makes it somewhat more cooperative with other processes, while still measurably better than the default. This feature was never intended as a complete replacement for configuration, and so being conservative with values is fine.

Seeking Feedback

Our intention is to change the default to innodb-dedicated-server=ON before the GA release of MySQL 8.0. We’ve not gotten there yet; as we believe there is an opportunity to collect feedback, and improve upon how we resolve some of the known corner cases.

While for many of our advanced users, this feature may not have much impact (configuration settings can be changed!), I remind you to please consider the goal is to narrowly focused on out of the box experience.

Please let us know your thoughts, and thank you for helping improve MySQL!

26 thoughts on “Plan to improve the out of the Box Experience in MySQL 8.0

  1. Short response: “Yes!”

    Long response:

    I assume there will always be the possibility for explicit values in my.cnf to override the proposed formulas. This should be the case whether innodb_dedicated_server_pool is TRUE or FALSE. Turning off the boolean will revert to 128MB? Bad idea. Many DBAs will want control. Turning off should revert to an optional setting for innodb_buffer_pool_size, which defaults to 128MB or whatever.

    The formulas seem reasonably correct. Are they in my.cnf? Or baked into the code?

    innodb_buffer_pool_size…

    In the dim, dark, past, one of the main problems with InnoDB was the tiny default setting for the buffer_pool, together with the lack of any clue to the naive user that he should change it. Increasing the default shutdown the flood of Forum questions that were resolved mostly by changing the size.

    For a long time, my Rule of Thumb has been “70% of ‘available’ memory”. Lately, the upsurge of tiny cloud VMs has made that too simplistic. I have pondered (server-memory – 1G) * 75%, but that fails for really small VMs.

    Increasing the buffer_pool_size does not help after it is somewhat bigger than all the data & indexes. (This is to contradict the statement “will improve…”.)

    innodb_log_file_size…

    In looking over machines I have analyzed, 512MB seems to be “good enough” for nearly all. Anyone who has a machine that needs a bigger setting is already heavy into doing his own tuning, and will override the new forumla.

    Meanwhile…

    If SSD, turn off innodb_flush_neighbors. And change some “cost” constants for the Optimizer.

    If RAID (of certain types), turn off doublewrite (and maybe some other things).

    If Clustering (of certain types), turn off doublewrite and other things. Instead, plan on failing over.

    1. Hi Rick,

      > I assume there will always be the possibility for explicit values in my.cnf to override the proposed formulas. This should be the case whether innodb_dedicated_server_pool is TRUE or FALSE.

      That is correct. Explicit values take precedence.

      > Turning off the boolean will revert to 128MB? Bad idea. Many DBAs will want control. Turning off should revert to an optional setting for innodb_buffer_pool_size, which defaults to 128MB or whatever.

      I’m not sure I understand. Could you clarify here? It may relate to the next point (baked or not).

      > The formulas seem reasonably correct. Are they in my.cnf? Or baked into the code?

      None are baked. This is intentional to make it cloud and container friendly 🙂

      > In looking over machines I have analyzed, 512MB seems to be “good enough” for nearly all.

      Thank you for the feedback! And yes, if it is ‘good enough’ for most, then it fits in with the goal of this feature.

      > If SSD, turn off innodb_flush_neighbors. And change some “cost” constants for the Optimizer.

      We have a separate proposal for innodb_flush_neighbors, which is to turn it off for all installations (unrelated to dedicated-server), as the default assumption for 8.0 should be SSDs.

      The cost constants in 8.0 are improved to reflect cost of in memory vs on disk. I have examples on slides 56-60 here: https://www.slideshare.net/morgo/mysql-80-optimizer-guide/56

      > If RAID (of certain types), turn off doublewrite (and maybe some other things).
      > If Clustering (of certain types), turn off doublewrite and other things. Instead, plan on failing over.

      Our goal is to be safe by default or “opt-out”. We can detect if storage supports atomic writes and disable doublewrite, but it is unsafe for a general default 🙁

  2. I’m sorry, but I don’t like this feature at all. I understand the goal of improving the out-of-the-box experience for naive users, but I don’t think this solution will be successful at this goal.

    Trying to pre-tune a MySQL installation with some formula is a one-size-fits-all solution, and these kinds of solutions are unreliable. We can recall examples of other products that have tried to do this, but eventually removed their auto-tuning features.

    It’s not a good assumption that the buffer pool needs as much physical RAM as you can afford. You already know this, because you need the innodb_dedicated_server option. Rick mentioned the possibility that the dataset is already smaller than RAM. In this case, adding more RAM has little or no benefit.

    Many naive users mistakenly believe (after reading some blog) that increasing RAM allocation always increases performance. It’s difficult to explain to them why this is not true.

    Likewise innodb log file. We assume that bigger is better, because of benchmarks showing that heavy write traffic benefits from bigger log files, because of delaying checkpoints. But what if you don’t have heavy write traffic? What if you use MySQL for a blog or a CMS that is 99% reads? The large log file is unnecessary. Sizing it for an assumed workload or dataset size has a high chance of being the wrong choice for tuning.

    I understand the difficulty of asking users questions during installation. I recently did a project automating MySQL provisioning with apt. It was annoying having to figure out debconf to work around the installation prompts that do exist (btw, please document MySQL’s debconf variables!).

    There’s also the problem that even if you do prompt the user for information, they don’t know the answers to the questions. This is especially true of the naive users that you’re targeting with this feature.

    If the installer asks “Do you use MySQL on a dedicated server?” do they even know what this means? They might think “dedicated” is simply the opposite of shared hosting.

    If the installer asks “Do you want to use all available memory on this system?” you will be surprised at how many users think “memory” refers to disk space, not RAM.

    In short: (1) Using formulas to tune MySQL is error-prone. (2) Asking users to make choices without information is error-prone.

    I have an alternative suggestion: Make it easier for users to become less naive about their choices.

    I think users need a kind of friendly cheat-sheet or infographic of how to make tuning decisions. This could include a list of questions about their data size and workload, and then a list of performance indicators to monitor and measure, like buffer pool page create rate, and log file write rate. Give tips on how to measure these things, what config options to change, and then how to measure again to verify that the change had the desired effect.

    A simple monitoring tool would also be useful. Nothing so sophisticated as PMM or VividCortex for long-term trending, but something more like pt-mext for quick, ephemeral measurements.

    The only thing the installation process needs to do is tell the user that tuning is a thing they need to do (many users don’t realize this), and refer them to the cheat-sheet documentation.

    1. I agree with most of what Bill says. However, I think that a formula is better than a constant that is too low for most users. That is, the formula is moving in the right direction, though not a panacea.

      As for a friendly cheat sheet, I have been saying for a few years “set the buffer pool to 70% of available RAM; ignore other tunables until you have some reason to worry.” The main flaw is the meaning of “available”.

      The waste in having too-big a log for the 99% read system is less troubling than the too-small log that recycles every minute. Not so much “one size fits all” as “one formula that keeps more people out of trouble”.

  3. Hi Morgan.

    I think I fall into Rick’s camp of not being completely convinced about the proposal as you state it. The idea to provide a better “easy” out of the box experience is good, and also to try to base that on real settings in the environment is also good.

    However, these settings strike me as being configuration_profiles, so I’d be tempted to make the setting name: configuration_profile = default (the value used at the moment).

    That makes it much easier to have another profile “innodb_dedicated” and if we are a bit mad or maybe thinking of other forks etc (yes it’s important to not forget them) then “rocksdb_dedicated” or “tokudb_dedicated”, though again depending on how smart your “pre-startup profile configurator” is having the profile called “dedicated” and default_storage_engine = “innodb” basically should be enough. If the user changes their default storage engine then the profiler could adjust the settings appropriately, or if it’s not then let the user do it.

    It’s also clear that you provide some ideas of these default settings. If I have access to the way that you provide these profiles and I can provide my own then I can choose on _my_ systems maybe to use a buffer pool size of 80% of RAM size and give the profile name “dedicated_80_pct” etc.

    The comment about the redo log files sizes is good, but I think the values you propose for most people are probably too large. My default settings are 2 x 512MB, but I do have servers which have larger or much larger (2 x 16GB) values. Ideally the server could see through some “metric” that it needs to increase the size (basically due to “heavy insert pressure”) and could do it on the fly (up to some dynamically configured maximum) but that could happen later.

    So the idea you bring up feels good, but the dedicated name doesn’t even if the intention behind it is reasonable. If you provide 2 profiles (default, innodb_dedicated) then make it possible for the DBA to potentially add his own extra profiles as that will simplify his task. I’m sure most people have puppet , chef or ansible doing this and it works but if the server helps us by providing the right hooks those who don’t have such tools will be able to do something similar and in most cases that will probably be enough.

    1. Hi Simon,

      Thank you for the alternative proposal! It has some nice properties that the existing proposal does not in that it can accommodate a middle ground ‘balanced’ configuration as well.

      In response to this point specifically:

      > The comment about the redo log files sizes is good, but I think the values you propose for most people are probably too large. My default settings are 2 x 512MB, but I do have servers which have larger or much larger (2 x 16GB) values. Ideally the server could see through some “metric” that it needs to increase the size (basically due to “heavy insert pressure”) and could do it on the fly (up to some dynamically configured maximum) but that could happen later.

      The redo log is not currently dynamic, which would prevent such an adjustment unfortunately.

      It is a noble goal post to aim for, but there will be other issues in ‘learning’. i.e. It could over-tune for a side workload such as a batch-load that occurs out of hours (and has plenty of time to run). The scope we are looking at for now is out-of-the-box experience.

  4. Hi Morgan. Ok, but still the size you propose is large and in many cases may be larger than the DB. Don’t over configure. For small DBs defaulting to 2048 M (x2 ?) means your disk foot print is going to be a minimum of 4GB. Of course that’s not huge but if you make a filesystem copy of the database rather than a logical backup then the extra overhead is really overkill. 512MB (x 2) works fine and I think for most people would be more than enough.

  5. One final thought.

    While you are looking at the server hardware pay attention to the threads that are configured for different tasks:
    * innodb read/write threads (hint: these are still not dynamic)
    * slave_parallel_workers (!)
    * purge threads etc.

    There are several different settings that can be used and while it’s not necessarily easy to figure out exactly the right settings here, if you want to configure stuff based on “memory size” it also seems wise to do a similar calculation based on “cpus”.

  6. > Our intention is to change the default to innodb-dedicated-server=ON before the GA release of MySQL 8.0.

    MySQL used as backend for many small applications on Linux desktops. Are you sure you want to use 75% on such machines?

    > innodb_flush_method = O_DIRECT_NO_FSYNC

    Are you sure this is safe option for everybody? Maybe makes sense to check which filesystem used?

    1. Hi Sveta,

      > MySQL used as backend for many small applications on Linux desktops. Are you sure you want to use 75% on such machines?

      No. See Problem #1.

      > Are you sure this is safe option for everybody? Maybe makes sense to check which filesystem used?

      We always try to have the same defaults everywhere. As I understand, the exception here is only for XFS.

  7. Hello Morgan,

    As stated, this proposal uses formulas based on the server_memory that is installed in a server. However, as Rick posited and has always been my experience that really innodb_buffer_size should be set to 50-75% of “available” memory. Leaving the rest of the available memory for allocated MySQL processes from connections.

    It would seem to me, that the formulas should instead be based off the free memory on the running system (regardless of VM, bare metals etc.). This would inherently take into account the memory use of any and all existing system processes and other default processes setup by a company’s ops staff for monitoring, logging, and other compliance processes that will always be running.

    I am surprised that MySQL would not move to something along the lines of Oracle’s SGA_TARGET/SGA_MAX_SIZE, that just sets a high water mark for the memory settings. As the actual memory allocated for different buffers in Oracle would be variable dynamically depending on the running systems’ profile. I would eventually love to see MySQL move in this direction as it would take into account the load profile of the applications that are using the database instance. The number of connections maintained by the application, how long lived they are, how much query work is performed by each connection when they are connected, are all factors that are not taken into account by simply setting a number based on a simplistic formula at installation/setup.

    Trying to make these innodb settings dynamic, may be rooted in some programmatic memory allocation issues I’m not aware of at the time, or how they could be broken through, but this in my opinion would bring MySQL closer to the end goal of being a better experience out of the box, without manual tuning.

    1. Hi Lee,

      Targeting available memory is more difficult. What if MySQL is the first application to start? (Edit: What I mean to say here is it is important for the ‘magic values’ to be deterministic.) Some of the settings being changed here are non-dynamic, so it requires a restart to change.

      MySQL 8.0.2 enabled memory instrumentation by default, which I see as a first step towards providing a wholistic view of current usage. But I see this as really an independent feature request 🙂

  8. It is a good idea to specify InnoDB’s memory size as a percentage of the memory of the server. However, it is strange that only innodb_dedicated_server has that function.

    I suggest firstly that we can specify innodb_buffer_pool_size as a percentage and set the default value to 25%.

    This makes it possible to effectively use memory even on servers with 64 GB of memory. We can also reduce the risk of invoking OOM Killer. Considering a two-node NUMA environment, the default value should be less than 50%.

    Then, if we turn on innodb_dedicated_server, Innodb_buffer_pool_size is changed to 75%. In this case it is better to make sure that Huge Pages is ON.

    1. Thank you for the suggestion of enabling huge pages when @ 75% 🙂 . We could also add memlock to that list.

      Because it is a big change, I think we will most likely keep the initial scope limited. But it’s certainly a possibility for the future.

  9. My 2 cents: Experienced DBAs know we have to change the default my.cnf/my.ini settings. Existing installations have a config file that won’t be overwritten by an upgrade.

    Given this, this change will only affect new installations, for DBAs who don’t change the default.

    I think it’s a reasonable starting point, and certainly will make it easier for the defaults to somewhat match the system.

    It also means we don’t have to wait 10 years for the defaults to change – IIRC currently innodb buffer pool defaulting to 128Mb, which might be appropriate for some cases, but isn’t for many cases. Assigning an arbitrary number (like 128M or 1Gb or whatever) means that when memory profiles change, the number becomes obsolete.

    The formula may not be perfect, but it’s much better than a hard-coded number.

    1. Hi Sheeri,

      Thank you for the input. Just wanted to clarify on this point:

      > It also means we don’t have to wait 10 years for the defaults to change – IIRC currently innodb buffer pool defaulting to 128Mb, which might be appropriate for some cases, but isn’t for many cases. Assigning an arbitrary number (like 128M or 1Gb or whatever) means that when memory profiles change, the number becomes obsolete.

      The 128M number is kept to match the smallest-sized VMs/cloud instances. Typically the smallest is 512M RAM, and has been for the last 5 years already. So you may indeed be correct that will be a long wait until it can increase.

  10. Oh, in case it’s not clear, I think “enabled” is the way to go. Especially because if it’s disabled, it needs to be changed to work, and the whole point is to make it work for people who don’t change the settings file.

  11. Ah, I see the nuance of the formulas vs. innodb_dedicated_server.

    I think innodb_dedicated_server should be on by default if there isn’t already a config file – not sure if that’s possible.

    If that’s not possible….well, it could end up messing up people’s innodb logs if they are using the default log size and then this setting changes it.

    BUT – I wouldn’t get the new variable until I upgrade, and I cannot upgrade while a server is running. So I think it should be enabled, and made SUPER clear in the release notes, that if you have problems that MySQL won’t start after upgrade, check your log file sizes, because this parameter has been enabled by default.

    Similarly for the innodb_flush_method, make sure to clearly state it in the release notes, that this default has changed.

  12. Like Sveta, I am also concerned about the default flush method (innodb_flush_method = O_DIRECT_NO_FSYNC) . At least for RedHat 7, the default filesystem is xfs.

    Since innodb_dedicated_server option is for not experienced administrators, I think the flush method has cover xfs too.

    Another thing that could be implemented is a way to store the settings that are not default to file (settings that is dynamically added).

  13. I think this is great idea.

    I believe MySQL is an extreme laggard when it comes to some auto tuning. The great example of the complex software which many of our use which does auto tuning and very successfully is Linux Kernel. While if you ask any “big boys” they will tell you they do tune Kernel for optimal performance its behavior is often good enough for majority of users

    I believe MySQL should autotune, and not only Innodb but also auto-scale default number of max connections, etc for resources available especially in the modern deployment environments, think docker container where it is expected what system will auto-scale and if you want to limit the size you use container settings.

    While implementing it one might want consider modern options such as cgroups to understand allowed memory rather than physical memory host can carry

Leave a Reply

Your email address will not be published. Required fields are marked *

Please enter * Time limit is exhausted. Please reload CAPTCHA.