InnoDB is a transactional storage engine. Two parts of the acronym ACID (atomicity and durability) are guaranteed by write-ahead logging (WAL) implemented by the InnoDB redo log.
A statement within a user transaction can consist of multiple operations, such as inserting a record into an index B-tree. Each low-level operation is encapsulated in a mini-transaction that groups page-level locking and redo logging. For example, if an insert would cause a page to be split, a mini-transaction will lock and modify multiple B-tree pages, splitting the needed pages, and ﬁnally inserting the record.
On mini-transaction commit, the local mini-transaction log will be appended to the global redo log buﬀer, the page locks will be released and the modiﬁed pages will be inserted into the ﬂush list. Only after the log buﬀer has been written out to the redo log ﬁle, InnoDB can write the modiﬁed pages from the buﬀer pool back to the tablespace ﬁles.
If the system crashes before the end marker of the mini-transaction log is written out to the redo log, the entire mini-transaction (and any subsequent mini-transactions) will be discarded. An operation is made durable by a combination of mini-transaction commit and a write to the redo log ﬁle. (Unless the undo log header update for the user transaction commit was written to the log before the crash, the user transaction will be rolled back.)
Problems with Single-Table Tablespaces
MySQL 4.1 introduced the setting
innodb_file_per_table, which instructs tables to be created in separate ﬁles, instead of creating them inside the InnoDB system tablespace, which is usually called
The InnoDB redo log is referring to tablespaces by a number,
space_id, while the ﬁle system knows the tablespaces as
tablename.ibd. If InnoDB startup notices that the system was not shut down cleanly (there were some log records written since the latest log checkpoint), it will have to construct a mapping from
space_id to ﬁle names, so that the redo log records can be replayed to compensate for the missing writes of modiﬁed pages from the buﬀer pool to the ﬁles.
To construct this mapping, InnoDB used to traverse the data directory, reading the ﬁrst pages of all
*.ibd ﬁles. This can cause a lot of unnecessary downtime when only a few ﬁles were modiﬁed since the latest log checkpoint. There could be thousands of ﬁles.
When MySQL 5.6 implemented the
DATA DIRECTORY clause for InnoDB tables, it introduced
*.isl ﬁles as placeholders, pointing to the real location of
*.ibd ﬁles. This added further complexity to the tablespace ﬁle discovery.
The MySQL 5.7 Solution: Write File Names to the Redo Log
MySQL 5.7 introduces a new redo log record type
MLOG_FILE_NAME for identifying those non-predeﬁned ﬁles that were changed since the latest log checkpoint. To avoid growing the volume of the redo log, only one record will be emitted for each tablespace that was modiﬁed since the latest checkpoint.
This change narrowly missed the MySQL 5.7.4 milestone release. It is included in the MySQL Labs Release based on 5.7.4. Note that this will change the redo log format, making upgrades and downgrades impossible unless the system was shut down cleanly before upgrading or downgrading.
The objective of this change is to eliminate the use of the ﬁle system as a ‘data dictionary’ during redo log processing (before applying redo log):
- Do not read the ﬁrst page of all
- Do not check the contents of
These changes will improve reliability as follows:
- We can ignore extra
*.ibdﬁles that are not attached to the InnoDB instance. For example, if the system crashes before the completion of
ALTER TABLE…IMPORT TABLESPACE, there could be ﬁles with duplicate
space_idthat could currently cause trouble. Thanks to the
MLOG_FILE_NAMEredo log records introduced in this feature, redo log apply can ignore such ﬁles unless there is a possible name clash due to
- We will not silently discard redo log records if some
*.ibdﬁle is missing without the redo log containing a MLOG_FILE_DELETE record. For example, if a ﬁle rename went bad and recovery failed because of a missing tablespace ﬁle, the DBA can manually rename the ﬁle and restart crash recovery. In
*.ibdﬁles will continue to be ignored.
- Failure scenarios related to inconsistent
*.islﬁles will be eliminated during redo log apply. Redo log records will contain references to
*.ibdﬁle names; the
*.islﬁles will only be used after redo log apply when opening tables.
This work is only a part of ongoing reliability improvements for InnoDB. Watch this space for updates. ☺