In this article, I’ll explain about the multi version concurrency control (MVCC) of large objects (LOBs) design in the MySQL InnoDB storage engine. MySQL 8.0 has a new feature that allows users to partially update large objects, including the JSON documents. With this partial update feature the way MVCC works for LOBs has changed when there are partial updates to LOBs. For normal updates (the full update), MVCC will work as it did in previous versions. Lets look at how MVCC works when partial update is not involved, and then consider the use case of partial updates to LOBs.
MVCC For Regular Updates
I use the term regular updates to refer to updates that are not partial updates. I will explain how MVCC works for regular updates of large objects with an example. I’ll use the following mtr test case for that purpose:
create table t1 (f1 int primary key, f2 longblob) engine=innodb;
insert into t1 values (1, repeat('a', 65536));
update t1 set f2 = repeat('b', 65536) where f1 = 1;
--echo # Connection con1:
--echo # Must see the old value 'aaaaaaaaaa'
select f1, right(f2, 10) from t1 order by f1;
--echo # Connection default:
drop table t1;
It’s important to understand the the above test case carefully in order to understand the explanations that follow.
The test scenario is as follows:
- Initially the table t1 contains a single record (R1).
- A transaction trx1 updates the record to a new value.
- While trx1 is still active, another transaction trx2 is reading the record. It will read the old value.
The table t1 contains only one record (R1). But trx1 and trx2 will see two different values. The table actually contains only the latest value (the value as seen by trx1) whereas the value or the record seen by trx2 is obtained from the undo log record. Let us look at the following pictures to understand it better.
Initial State: Before Update Operation
The following diagram visualizes the situation before the update operation. The undo log is empty. The clustered index of the table contains one row. There is one LOB in the table. The clustered index record contains a reference to the LOB.
Final State: After Update Operation
Now let us look at the situation after the update operation.
Here are some important observations:
- There are two LOBs in the user tablespace – the old one and the new one. The old LOB is accessible only through the undo log. The clustered index record points to the new LOB.
- The update operation has created a undo log record containing the update vector. This undo log record points to the old LOB.
- The clustered index record points to the undo log record via the DB_ROLL_PTR system column. This roll pointer points to the undo log record which can be used to construct the previous version of the clustered index record.
- The undo record does not contain the LOB itself. Rather it only contains the reference to the LOB which is stored in the user tablespace.
- The LOB reference stored in the undo log record is different from the LOB reference stored in the clustered index record.
The steps taken by the transaction in connection 1 are as follows:
- The transaction looks at R1 and determines that the transaction that modified the clustered index record, is not yet committed. This means it cannot read that record (because the default isolation level is REPEATABLE READ).
- It looks at the DB_ROLL_PTR in R1 and finds the undo log record. Using the undo log record the previous version of R1 is constructed.
- It reads this constructed older version of R1. Note that this version is not available in the clustered index record. But rather it is built on the fly using the undo record.
- While R1 points to the new LOB, this constructed old version of R1 points to the old LOB. So the result contains the old LOB.
This is how the MVCC of LOBs work when the partial update is not involved.
MVCC for Partial Updates
Let us look at another example to understand how MVCC works in the case of partial updates. We need another example because currently partial update is supported only for JSON documents via functions json_set() and json_replace().
create table t2 (f1 int primary key, j json) engine=InnoDB;
set @elem_a = concat('"', repeat('a', 200), '"');
set @elem_a_with_coma = concat(@elem_a, ',');
set @json_doc = concat("[", repeat(@elem_a_with_coma, 300), @elem_a, "]");
insert into t2 (f1, j) values (1, @json_doc);
update t2 set j = json_set(j, '$', repeat('b', 200)) where f1 = 1;
--echo # Connection con1:
--echo # Must see the old value 'aaaaaaaaaa...'
select json_extract(j, '$') from t2;
--echo # Connection default:
drop table t2;
The scenario is the same as previous example. Just the longblob field has been changed to a JSON document. The loaded data is also slightly different to conform to the JSON format.
Tips: You can add the statement set debug = ‘+d, innodb_lob_print’ in the above mtr test cases (both of them) to print the LOB index in the server log file. The LOB index will be printed just after inserting it. The LOB index will provide you with the structure of the stored LOB object.
Before Partial Update Operation
The initial condition before a full or partial update operation is the same and is already presented above. But in the following diagram, some additional information is presented.
Let us look at the additional information shown in diagram:
- The LOB reference stored in the clustered index record now contains a LOB version number v1. During an initial insert operation this is set to 1 and is incremented for every partial update.
- Each LOB data page has one entry in the LOB index. Each of this entry contains the LOB version information. Whenever one LOB data page is modified, it is copied into a new LOB data page with new data, and a new LOB index entry will be created with an incremented LOB version number.
The additional information is the LOB version number. This is available in LOB reference in the clustered index record and also in each entry of the LOB index.
After Partial Update operation
The following diagram illustrates the situation after a partial update operation.
The most important optimization here is that there is still only one LOB in the user tablespace. Only those LOB data pages that needs to be modified are updated. This single LOB after a partial update operation contains both the old and new version of the LOB. The v1 and v2 labels on the LOB data pages in the diagram illustrates this.
Another important observation is that the LOB reference in both the undo log and the clustered index record point to the same LOB. But the LOB reference contains different version numbers. The LOB reference in the undo log record contains v1 (the old version number) and the LOB reference in the clustered index record contains v2, the new version number.
The Purpose of LOB Version Number
As shown above, different LOB references with different version numbers point to the same LOB. A single LOB contains parts from different versions. The LOB version number is used to obtain the correct version pointed to by various LOB references. In this section we will look at how this is done.
The LOB index contains a list of LOB pages that make up the LOB. It contains the page numbers of the LOB data pages, amount of data each LOB data page contains and also the version number. Each node of this list is called the LOB index entry. And each LOB index entry contains a list of older versions. Let us look at a diagram illustrating this structure for the partial update test case above.
Initially, before partial update is done, the LOB index contains a total of 4 entries. The page number of the four entries are 5, 6, 7 and 8. None of the LOB index entries have older versions. All the four entries have version number 1.
After the partial update is done, we notice that the page number 9 has replaced the page number 7, and that the page number 7 is now considered as an older version of page number 9. The page number 9 has version number as 2, and the page number 7 has version number 1.
After partial update is done, when the LOB is accessed via a LOB reference with version number 1, then first index entry with page number 5 will be looked into. It has version number 1. If the version number in the index entry is less than or equal to what is available in the LOB reference, then that entry will be read. So page number 5 will be read. Then the index entry with page number 6 will be looked into. This has version number 1, so it will be read. Then the index entry with page number 9 will be looked into. This has version number 2. But the lob reference has version number 1. If the version number in the index entry is greater than what is available in the LOB reference, then that entry will not be read. Since the entry with page number 9 has version 2, its older version will be looked into. The index entry with page number 7 will be checked. It has version number 1, so it will be read. After this the index entry with page number 8 will be checked. It has version number 1, and hence it will also be read. This is the way an older version of LOB will be accessed.
After partial update is done, when the LOB is accessed via a LOB reference with version number 2, then first index entry with page number 5 will be looked into. It has version number 1. If the version number in the index entry is less than or equal to what is available in the LOB reference, then that entry will be read. So it will read page number 5, 6, 9, 8 in that order. Since the version number will always be <= 2, there will not be a need to access the page number 7 with older version.
One important point to remember is that the LOB does not exist independently in InnoDB. It is to be considered as an extension of the clustered index record. Whether a transaction can see an LOB or not, is not handled at the LOB module. Rather, this is done for a clustered index record. If a transaction accesses an LOB it means that it has already determined with the help of DB_TRX_ID in the clustered index record that it can view the LOB (rather a particular version of the LOB). So we do not worry about that aspect in the LOB module. We only focus on providing the correct contents for a given LOB version number.
In this article we saw how MVCC is done for large objects in InnoDB. When partial update is done for LOBs, multiple LOB references can point to the same LOB. But they will be having different version numbers. Using these LOB version numbers, it is possible to access the right LOB contents.
Hope you find this information useful.
And thank you for using MySQL !