In MySQL 8.0, we are making large changes to the way the MySQL server stores meta data with the introduction of our native data dictionary. As part of these improvements, we have also made changes to the way the server bootstraps. This blog post will explore what happens when the MySQL server starts, and in particular, how we initialize the transactional data dictionary. We have made changes in this area in several iterations, and we will point out the improvements in terms of functionality as well as implementation, and how we think this will enable further long-term improvements.
We can start the MySQL server in two different ways:
- Initial start: When we start the MySQL server with a new data directory, we refer to this as the initial start. In this starting mode, the server will create the directories and files that it needs, and do initialization of persistent data required for later restarts. When the initialization finishes, the server exits.
- Restart: After an initial start, we may restart the server. In this case, the server will use the files and directories that were created under the data directory when the server was doing the initial start.
Below, we will take a closer look at the first item above: initial start, explain how this used to be done, how it is done in MySQL 8.0, and why we think the changes in this area are improvements.
mysql_install_db and a file based dictionary in MySQL 5.6
The way initial start used to be done, the MySQL server was started by a Perl script called mysql_install_db. This script started the server with the command line option —bootstrap, and SQL scripts were assembled and fed into the server by the Perl script. The SQL scripts contained e.g. the CREATE TABLE statements necessary to create the system tables.
This approach posed a few problems:
- The server executable wasn’t self-contained. In addition to the executable, both the SQL scripts and the mysql_install_db script had to be present.
- The various scripts could be messed up by editing.
- Portability across platforms was challenging due to the support needed for running the mysql_install_db script.
mysqld –initialize and a file based dictionary in MySQL 5.7
In MySQL 5.7, some of the shortcomings in bootstrapping were addressed. The mysql_install_db client was deprecated, and instead, the server was extended by a new command line option —initialize to do what mysql_install_db used to. The SQL scripts, previously shipped as separate files, were now built into the server executable, thus making the server binaries self contained.
- The SQL scripts are now built into the server executable as character strings. The external Perl script is gone, and the server iterates over the strings and executes the statements. Thus, the server executable is now self-contained.
- Being compiled into the server binary, the SQL scripts cannot be easily changed, unless you modify the source and recompile the server.
- Getting rid of the external Perl script also means we do not rely on a script interpretation engine, which facilitates portability across platforms.
mysqld –initialize and a transactional dictionary in MySQL 8.0
Conceptually, in MySQL 8-0, initial start is in many ways similar to the way it used to be in MySQL 5.7, but the infrastructure for making this happen is extended. We now represent the dictionary tables by abstractions rather than just plain SQL CREATE TABLE statement strings. This provides a mechanism to e.g. more easily support automated upgrade of the tables.
- The table meta data is stored in InnoDB tables rather than in .FRM files, which are now abandoned.
- The dictionary tables are defined by C++ object instances providing a structured representation of the meta data, with e.g. lists of fields with data types, ordinal positions, collations etc. The CREATE TABLE statement used to create the dictionary tables are synthesized from these C++ objects defining the tables.
The future of bootstrapping
The change in the dictionary table definitions, now represented by C++ objects rather than SQL statements, points out the way we would like to go, also regarding the system tables (i.e., not belonging to the data dictionary). Representing the tables by abstractions allows for a tighter integration between the server and the dictionary tables in terms of e.g.:
- Explicitly using the appropriate string comparison functions for the given collation.
- Automatically generating keys for fetching records from the tables, using the correct types and number of fields, the correct index, etc.
- Supporting upgrade based on an actual table definition being present in the data directory, and a different target definition being prescribed by the server executable. This allows for more tailored upgrade scenarios than the catch-all statements used by the current upgrade implementation, and may also pave the way for allowing upgrade based on the server binary alone, without external scripts or client utilities.
- Folding the table definitions and the handling of the tables into the server binary means that we can protect the table to a much larger extent that we could previously.
The initialization of the transactional data dictionary has some interesting aspects in terms of how to store the meta data of the dictionary tables themselves – after all, these tables can in many ways be considered ordinary tables, and their meta data should be stored in the tables themselves. This is done when the table is created, but at the same time, the tables do not exist until after they are created… Solving this chicken and egg problem will be the topic of another blog post in this area.