The latest release of MySQL 8.0 introduces a new dynamic system variable
@@innodb_buffer_pool_in_core_file which lets you omit the Buffer Pool’s memory content when generating a core file.
This change is an adaptation of a patch contributed by the Facebook team. We would like to thank and acknowledge this important and timely contribution by Facebook.
A Core File
A core file or core dump is a file that records the memory image of a running process and its process status (register values etc.). Its primary use is post-mortem debugging of a program that crashed while it ran outside a debugger.
To enable core file creation in case MySQL crashes, you have to specify
--core-file command line option when running
mysqld, which changes the value of
@@core_file read-only system variable to
ON from its default value of
For example, suppose you happen to be using Linux, and you’ve run
./bin/mysqld --core-file --datadir=/var/mysql/data
and it crashed due to some bug (which you can simulate using
kill -s SIGABRT $pid where
$pid is the id of your mysqld process). Then you can inspect the state just before the program has crashed using:
gdb ./bin/mysqld /var/mysql/data/core.$pid
The exact filename and location of the core file depends on your particular system configuration – our example assumes that
cat /proc/sys/kernel/core_pattern outputs
core.%p and that
cat /proc/sys/kernel/core_uses_pid outputs
1, and that
/var/mysql/data is your data directory (which is used as current working directory by
mysqld process, and that’s why the core file is by default created in it).
The Buffer Pool
The InnoDB Buffer Pool is a storage area for caching data and indexes in memory. Together with InnoDB Redo Log and the data pages persisted on disk, they form a low-level abstraction of I/O: data can be thought of as divided into pages, where each page is identified by its tablespace id and page id, and InnoDB can load, modify, and store such pages in an atomic way. Only on top of this abstraction the more complicated structures of various primary and secondary indexes are built, which use these low-level pages to store nodes of trees for example.
To perform any work on a page, the page needs to be brought from disk to memory, and the place in memory where we keep such pages is called the Buffer Pool. Subsequent usages of the same page can be served from the Buffer Pool as long as the page was not removed from it (a.k.a. evicted) which may happen if there is not enough space in memory to hold all the pages which are accessed. In this regard the Buffer Pool serves as a cache for pages on disk. Also, to avoid writing a page to disk each time it is modified, a page is only marked as dirty in memory, but the write to disk is deferred until it is really necessary, and only the information needed to recreate the state of the page after crash is written to the append only write ahead log (called the Redo Log). One can see from this rough description that the larger the Buffer Pool the more rare are situations in which we have to perform costly disk I/O operations. Thus, the Buffer Pool is often configured to consume a considerable fraction of available RAM.
Since the Buffer Pool resides in main memory, and the memory of a process is dumped to a core file, it follows that a huge Buffer Pool results in a huge core file. This can be problematic for several reasons:
- a big file consumes space on disk, which can create a cascade of problems if there is not enough space
- a big file takes longer to write
- a big file is more difficult to move around, in particular when one needs to send it to somebody else for analysis
Also, the Buffer Pool contains pages of the database, which poses some security considerations when it gets dumped to a file.
There are however cases, where investigating the crash would benefit from having access to the exact content of pages at the moment of crash.
So, there are good reasons to exclude the Buffer Pool from a core file, but also there are scenarios where you would rather prefer to have the data.
Advising operating system about our intention
On Linux 3.4+ a programmer can use a non-POSIX extension to
madvise() interface by calling
madvise(ptr,size,MADV_DONTDUMP) to let the operating system know, that
size bytes of memory pointed by
ptr should not be dumped to a core file.
In the patch contributed by Facebook
madvise() was used on all large buffers allocated by MySQL to make core files smaller.
We have ported this patch to MySQL 8.0 narrowing it down to the Buffer Pool pages only.
The innodb_buffer_pool_in_core_file variable
Striving for backward compatibility, we’ve introduced a new system variable
@@innodb_buffer_pool_in_core_file, which by default is set to
ON , in order to mimick the old behavior. Also, this new variable only affects behavior if
ON, as otherwise there will be no core file generated at all.
Only when this variable is set to
OFF (for example by passing
--skip--innodb-buffer-pool-in-core-file via command line) we change the behavior. If all following conditions hold:
@@core_fileis set to
- the operating system supports
the OS will be advised to exclude the Buffer Pool pages from a core file.
When something goes wrong, we’ve decided to err on the safe side. If the user didn’t want the Buffer Pool data to be included in a core file, but the operating system does not fully support that intention, we make sure that core file will not be generated at all. We believe that this is a better option than writing the core file anyway, which might expose sensitive data or overflow the disk. Thus, if
@@innodb_buffer_pool_in_core_file is disabled but an
madvise() failure occurs or marking Buffer Pool pages as
MADV_DONTDUMP is unsupported by the operating system, an error is written to server’s error log and the
@@core_file variable is disabled to prevent core file from being written.
This may sound a bit complicated, so here is a table covering all cases:
|@@core_file||@@innodb_buffer_pool_in_core_file||OS supports MADV_DONTDUMP||effect|
||*||*||no core file will be generated at all|
||*||a core file will be generated including the Buffer Pool|
||yes||a core file will be generated without the Buffer Pool|
||no||no core file will be generated at all,
In an effort to make run-time configuration as smooth and simple as possible, this new
@@innodb_buffer_pool_in_core_file system variable is dynamic, so you can change its value whenever you like, for example using this command:
SET GLOBAL innodb_buffer_pool_in_core_file = OFF;
(Keep in mind, that as explained above, on systems which do not support
MADV_DONTDUMP above command will set
OFF, and since
@@core_file is read-only there is no way to set it back to
ON without restarting the server. You can use
./mtr --mem innodb.mysqld_core_dump_without_buffer_pool to check if your system supports this feature.)
To restore the old behavior use:
SET GLOBAL innodb_buffer_pool_in_core_file = ON;
You can check the current value using:
You might also want to check if
@@core_file is enabled:
How much you gain by enabling this option is obviously dependent on how large your Buffer Pool was in the first place, but to give you some rough idea here’s a table with example results for
as you can see the InnoDB page size itself impacts the size of core file, because the smaller the page, the larger the number of pages, and thus more metadata for these pages. The difference between
OFF is not exactly 1GB as there is some variance of core file size between multiple runs and the exact moment of crashing the server, and obviously you can’t easily crash the same process more than once.
Thank you for using MySQL !