Planning the defaults for MySQL 5.8

In the MySQL team we pay close attention to the default configuration of MySQL, and aim for users to have the best out of the box experience possible.

Following on from a series of blog posts for MySQL 5.7, we are now planning the defaults for MySQL 5.8. We are looking for feedback from you, our users, to share insights on what can be improved.

An Introduction

Being the default means that these settings should appeal to the largest group of users possible. This creates a constraint where values are often the lowest common denominator. The general guidelines for new defaults is as follows:

  1. Must work on virtual machines and cloud instances out of the box. Currently this stands at 64-bit Linux with 512M memory and swap disabled.
  2. Follow the principle of least surprise. This means that if a new option is introduced to potentially lose data (at increased performance) it should be off by default.
  3. No change for the sake of change. That is to say that some changes may not be suitable if the upgrade effort is disproportionate to the value brought.
  4. The test-suite should pass. For some changes, this can result in significant work behind the scenes.

Areas of Interest

We are looking at making improvements to the following areas. I have included in italics what feedback would be most useful. Please leave a comment, or get in touch! We would love to hear from you.

Character Set and Collation

We are considering changing the default character set to utf8mb4. Modern applications frequently store 4 byte characters, as emoji input is common for mobile devices. This change has some impact. For some backstory:

  • In MySQL 5.7 we introduced a new type of internal on-disk InnoDB table that is used by the optimizer for storing intermediate results which do not fit within the bounds of tmp_table_size (which currently use the MEMORY/HEAP engine). We hope to further enhance the performance of internal InnoDB tables so that they may also replace the current use of the MEMORY/HEAP engine, which does not support variable length columns, and thus may under-perform and over-consume when using multi-byte character sets.
  • In MySQL 5.7 we changed the default row format to DYNAMIC, which supports longer indexes than earlier row formats (previous: 767 bytes now: up to 3072 bytes).
  • The DYNAMIC (and COMPACT) row format will use variable length storage for multi-byte characters for both data and indexes, leading to no regression in storage on disk.
  • As of MySQL 5.7, the sort buffer will also pack variable length columns.
  • As the character set is saved on a per column, table, and schema basis (and preserved by mysqldump etc), MySQL upgrades will work smoothly with the user continuing to use their existing character set. New schemas will by default use the new default character set.

In addition to utf8mb4, we are also considering switching the default collation to be utf8mb4_unicode_520_ci. We plan to add collations to include per-language collation rules as we currently offer with latin1 based collations. They will have a more modern version of the UCA (Unicode Collation Algorithm) than the
utf8mb4_unicode_520_ci.

We invite feedback on this change.

Event Scheduler

We see opportunities to expand the SYS schema to include functionality such as idle transaction timeout. Enabling the event_scheduler by default is a stepping stone in the process. We are also excited to see what opportunities this will have for third party applications, for example Yahoo recently announced a Partition Manager based on events.

We invite feedback on this change.

Misc Session Buffers

With our goal being to fit within 512M of RAM out of of the box, we are looking to take a conservative approach on increasing per-session buffers. Having said that, there are a few settings which we have been received feedback on:

  • group_concat_max_len (currently 1K) – This buffer is the maximum size of GROUP_CONCAT() elements before truncation.
  • join_buffer_size (currently 256K) – The minimum size of the buffer that is used for plain index scans, range index scans, and joins that do not use indexes and thus perform full table scans. Also used by the BKA optimization (which is disabled by default).
  • read_buffer_size (currently 128K) – This buffer is used by MyISAM as well as all other engines.
  • read_rnd_buffer_size (currently 256K) – This buffer is used by MyISAM and the MRR optimization introduced by MySQL 5.6.
  • sort_buffer_size (currently 256K) – In MySQL 5.7, we introduced an optimization to pack variable length columns inside the sort buffer. We are interested to see if this changes recommendations for configuring the sort buffer.
  • max_allowed_packet (currently 4M) – This is the per-connection maximum size of an SQL query or row returned. The value was last increased in MySQL 5.6.

We invite feedback on new values for these settings.

Security

Our goal is to make sure that when you download and install MySQL, it is secure by default. This means that you should not require any configuration changes in order to deploy in production, and you must opt out to have less secure options. We are proposing the following changes:

  • symbolic_links – Many of our configuration files for MySQL packages (as well as those that ship with Linux distributions) currently default to disabling symbolic links. We feel that changing a de facto default (disabling symbolic links using configuration files) to a compiled default will improve user experience.
  • local_infile – This setting controls whether LOCAL is supported for LOAD DATA INFILE statements. We are proposing that this be disabled by default.
  • default_authentication_plugin – We would like to move towards sha256_password being the default authentication plugin, as the current SHA1 hash used by mysql_native_password has known vulnerabilities that are projected to be exploitable in the near future. This has the attached constraint that sha256_password requires TLS connections or RSA encrypting the password in transit.

We invite feedback on new values for these settings.

Slow Query Log

The default long query time of 10 seconds is more than two orders of magnitude larger than what we would expect most queries would take in a modern web application (100us-100ms). We are considering lowering long_query_time to be in a range closer to one order of magnitude longer than typical query execution time. We would also like to consider changing the values of log_slow_admin_statements and log_slow_slave_statements.

We would be interested to hear how you configure the slow query log. Since the slow query log is not enabled by default, we do not run the risk of filling a volume accidentally, and users who enable the slow query log will benefit from more optimal settings.

InnoDB

We have identified the following settings in InnoDB as candidates for a change in default:

  • innodb_autoinc_lock_mode (currently 1) – With row-based replication now the default in MySQL 5.7, we would like to also change the default value of innodb_autoinc_lock_mode to 2. This has the advantage of auto increment allocation being concurrent for multi-insert statements (by default it is only concurrent for single insert statements).
  • innodb_log_file_size (currently 48M) – With the InnoDB buffer pool now dynamic, we see an opportunity to increase the default log file so that a running system can be re-configured dynamically to perform better. We are hesitant to make the log files too large, as it increases the first-start time, which is important for our Docker efforts.
  • innodb_print_all_deadlocks (currently OFF) – Since MySQL 5.6 we have the option to print all deadlocks to the error log, which can be useful to retroactively investigate locking problems. To date, we have not enabled this option by default, because on a high-throughput system some level of deadlocks is natural, and logging all has the ability to fill logs very quickly. We are seeking opinions from the community as to which is the better default.
  • innodb_sort_buffer_size (currently 1M) – The InnoDB sort buffer is used by InnoDB during the creation of indexes. Larger values can improve index creation performance.
  • innodb_stats_persistent_sample_pages (currently 20) – In MySQL 5.6 we introduced a new persistent version of InnoDB sampling statistics (enabling it default). As part of this change, we also increased the default sampling from 8 to 20 pages, and have seen a good result in improving query plan stability. We have seen customer cases where increasing the sampling to 100-200 has been beneficial, and are considering increasing the default.

We invite feedback on new values for these settings.

InnoDB IO Configuration

Our current defaults are conservative in their assumptions around IO performance. With SSDs being prevalent on even low-end cloud machines, we may consider a set of defaults which assume faster IO performance. This will result in a reversal where users with slower IO (hard drives) will now need to make changes in order to perform better. The specific settings are:

  • innodb_flush_method – InnoDB supports direct IO via either O_DIRECT or O_DIRECT_NO_FSYNC. The benefit of doing this, is that the buffer pool pages will not be double buffered in the fileystem’s cache. We think this is a good default for fast IO systems, but may be worse in slow IO systems where the filesystem cache has benefit. As part of this change, we would also like to switch Windows to use unbuffered IO.
  • innodb_flush_neighbors – InnoDB by default will attempt to merge writes to adjacent pages as part of its background flushing operations. This optimization works very well for slow IO devices (such as hard drives), but is not always useful on fast IO devices. We are considering changing the default to be OFF.
  • innodb_page_cleaners – In some workloads the default of one page cleaner is unable to keep up with the background work of flushing dirty pages from the buffer pool. With fast IO devices benefitting from writes occurring in parallel, we are considering a larger default value.

We invite feedback on whether you agree the defaults should now assume fast IO devices. Users with slower IO performance can of course restore previous defaults.

Timeouts

We are considering changing some of the default values for various timeouts, many of which are very long for today’s web applications. The settings we invite feedback on are:

We will likely need to be conservative in making changes to timeouts, but would be interested in hearing what values our users are using for these settings.

Misc

The following settings do not fit under a specific category, but have been previously raised as suggestions from our users:

  • lower_case_table_names (currently 0) – MySQL is currently case-sensitive on filesystems that are case-sensitive, and insensitive on those that are not. It has been suggested that being case-insensitive on all may provide a more consistent cross platform experience.
  • max_connect_errors (currently 100) – If more than this many successive connection requests from a host are interrupted without a successful connection, the server blocks that host from further connections.
  • max_digest_length (currently 1024) – We use the max_digest_length in a number of places in the server; such as the MySQL Firewall and statement rewrite. We have heard feedback from a number of users that the default is too low and leads to statement truncation. Changing the setting to a higher value is also inconvenient as it is not dynamic. We would like to increase default max_digest_length to a value which covers the maximum length for most typical queries.
  • thread_cache_size (autosized) – In MySQL 5.6 we changed the thread cache to autosize by default. We’ve heard feedback from users that they are still changing this setting, and are inviting feedback as to whether the autosize algorithm could be improved.

We invite feedback on new values for these settings.

Replication

Configuration changes to replication are covered in this dedicated post on the MySQL High Availability blog.

Conclusion

This post is our first in what will likely be many more to follow. We look forward to your feedback, and thank you for helping make MySQL better!

8 thoughts on “Planning the defaults for MySQL 5.8

  1. Good. And Bad. Good that the new defaults are better. Bad in that migrating folks will incur pain, and the forums will be littered with simple questions about how to fix what they think are screwups.

    The most important thing that is not set “out of the box” is innodb_buffer_pool_size. It should be set to some large percentage (I say 70%) of _available_ RAM. The percentage is easy, the knowledge of how much RAM is _available_ is problematical. The old small/med/large my.cnf files are pathetic by today’s needs. To work on 0.5GB of RAM, the buffer_pool must rediculously small. In addition to automatically discovering the RAM size, perhaps one question would suffice to handle “available”: “Will this be a standalone MySQL server?”

    utf8mb4 — good decision. Currently the biggest impediment is the 767, which becomes VARCHAR(191), thereby breaking existing VARCHAR(255) schemas. But, with default of DYNAMIC and Barracuda and innodb_file_per_table = ON and maybe something else, then it could work. In the past it has been 4 painful steps to relax the index limitations; it would be nice to avoid that in the future.

    I am reluctant to say innodb_file_per_table = ON is the right default. It’s good for _big_ tables. It’s not so good for _tiny_ tables, especially if there are hundreds or thousands of them. (Yeah, people still do dumb things like that.)

    The tmp table improvements seem good; I am withholding judgement until they settle down.

    utf8mb4_unicode_520_ci — Yes, that is probably the ‘right’ one. It has been the _only_ solution to about 2 forum questions that I have encountered. And it probably does not hurt anyone who would be otherwise using unicode_ci.

    The movement of things out of information_schema into performance_schema is somewhat disruptive and seems gratuitous; I would like to here more on the justification. Will more be moving out?

    I assume that EVENTs suffer the same flaw that cron suffers — If the mysqld is down when it is time to launch an event, it will be missed. This can (and should) be coded-around in, say, a Partition Manager. But it is one extra thing to worry about.

    group_concat_max_len — 1K is too small. I see no harm in increasing it, but I don’t know whether it is allocated full-size, and other issues that could cause RAM usage to balloon.

    The various *_buffer_size — Its hard to have an opinion without understanding how their sizes relate to index sizes (etc), and without having sufficient metrics to judge when “larger” (or “smaller”) would be advisable.

    Packing variable length columns (sort_buffer, Memory, etc) — well, Duh?

    If you change the local_infile default (which is currently insecure), please make the error message clear as to why the LOAD DATA INFILE is refusing to run.

    long_query_time — 10 is virtually useless. 1 may be a reasonable compromise between “useful” and “filling up the disk”.

    query_cache_type=0 _AND_ query_cache_size=0 — Default _both_ to off/0.

    log_slow_admin/slave_statements — Turn ON by default.

    log_queries_not_using_indexes — Turn OFF by default. I find that it clutters the slow log while not providing useful info. The only time when it might be useful is when you have a tiny amount of data and you expect the data to grow a lot.

    Include the EXPLAIN in the slowlog.

    Don’t limit the TABLE version of the slowlog to integers. Why not FLOAT, instead.

    The configurations for the many logs are inconsistent, confusing, and multi-step. Rotating logs should be consistent. It should not take 2 commands to turn on an optional log. Etc. Yes, General, Slow, and Bin logs should be off by default, as a disk protection.

    innodb_log_files_in_group may as well be hard-coded as 2. Or is there some argument for ever having more than 2?

    Several logs should be subject to some form of throttling. For example, innodb_print_all_deadlocks leads to thousands of deadlocks per day, the info can be overwhelming. Eliminating “dups” when there are a lot may be practical and useful.

    innodb_log_file_size — defaulting to some fraction of buffer_pool_size seems reasonable, but the current wisdom of 1/2 or 1/4 may be unnecessarily big.

    (More later)

  2. I have responded to more than 15K forum questions, so I think I know more than most what the users’ pulse is.

    If you see any “errors” in what I am saying, please point them out.

    innodb_flush_method=O_DIRECT — Can this be defaulted “correctly” based on information found in the OS?

    innodb_flush_neighbors — I agree with defaulting to OFF. ON probably hurts SSD and RAID systems, where you have paid for performance. On old spinning drives, you are probably running a non-performance-critical application. Can’t you discover the hardware and make an educated guess??

    innodb_page_cleaners — What STATUS value(s) says whether there are enough cleaners? Having the setting is lame. Discover over time that more are needed and dynamically increase the number.

    lock_wait_timeout — 50 seconds is “too long”. If you have a transaction that has to wait more than 5 seconds, something needs fixing. Is that a STATUS (or P_S) for avg_wait_timeout?

    wait_timeout — very confusing — GLOBAL versus SESION and “batch” versus “interactive”. Users don’t know how to change the right thing; I can’t remember what to tell them. Is thare a set of STATUS values to say how often timeouts occurred?

    lower_case_table_names — only a minor nuisance; not many people mix or migrate between Windows and Unix. Changing to all case-insensitive could be a rude surprise for some users.

    max_connect_errors — half of users leave it at 10. I have perhaps never heard of an issue with that setting. Raising it gives hackers a little more chance to sneak in. Suggest leaving it low.

    max_digest_length — increase.

    thread_cache_size — this used to be a common source of both issues and confusion. As I understand it, 0 is best for Windows. And a small number (say, 10) is good enough for virtually all Unix systems. However, the confusion comes from DBAs thinking it is the total number of entries, not just the number kept in reserve. Maybe a name change, followed (eventually) by removing the current name?

    key_buffer_size — With the default being InnoDB, the default for key_buffer_size can probably be shrunk. Perhaps 1% of RAM?

    max_connections — The change to 151 (5.1.15) seems to be quite OK. (I think it is bad to set Apache’s MaxClient to more than about 20.) Many DBAs increase it past 151; some get in trouble.

    max_allowed_packet — It is a common topic, so it can probably be raised. However, I do not know the down-side of setting it to, say, 1G.

    myisam_sort_buffer_size — 8M seems too small, but I don’t know. Maybe 1% of RAM would be better? About half of users leave it at 8M; most of the rest increase it.

    1. 1% of ram if using the min spec machine of 512MB would be 5MB, ie less than the current default of 8MB, perhaps better to say 1% or 8MB, whichever is more.

  3. utf8mb4 as default for connections, tables, schemas and columns: Yes please.
    I also think utf8mb4_unicode_520_ci as default is very usefull.
    But what about utf8mb4_general_cs?

    event scheduler: Yes nice, but this might wakup the tread often, which is not so green and a bit wasteful. Maybe auto-start it on the creation of the first event?

    Security: I’m not sure sha256_password is ready for primetime. Not all software fully supports authentication plugins.
    What about hostname verification for SSL/TLS? Maybe only for replication?

    Slow Query Log: I’m not sure, P_S is much more useful, so I don’t think this is that important.

    innodb_autoinc_lock_mode=2 Yes please. This makes online schema change tools work better on (very) active tables.

    And please try to change them in (or before) the first DMR release to ensure they get proper testing before the GA release.

  4. Hi Morgan,

    One feature I’m really missing in the slow query log is the possibility to enable it (and the eventual changes like the slow query time) for persistent connections.

    If we want to capture everything on a running server I usually set long_query_time = 0, but unfortunately this doesn’t affect the actual connections.

    Cheers,

Leave a Reply

Your email address will not be published. Required fields are marked *

Please enter *