Partial update of JSON values

MySQL 8.0 introduces partial update of JSON values, which is a nice performance improvement for applications that frequently update small portions of large JSON documents. Before, in MySQL 5.7, whenever you made a change to a stored JSON document, the full new JSON document would be written to the database, even if the update just changed a few bytes in the document. What’s new in MySQL 8.0, is that MySQL is able to recognize that certain UPDATE statements can modify JSON documents in place, and for these statements it will provide the storage engine with patches that describe the modifications. The storage engine can use these patches to write the minimal amount of data. The patches are also used by row-based replication to reduce the amount of binary log shipped to the replication slave, if the proper option is enabled. This can result in significantly reduced disk I/O and network I/O for update-intensive workloads.

How to use it

So what do you have to do to get the benefit of partial update of JSON values in your application? In many cases nothing, apart from upgrading to MySQL 8.0. There is no new syntax for partial update. Instead, MySQL analyzes each UPDATE statement and enables it when it can, so your application will benefit from the improvements automatically if the UPDATE statements already are on a form that is recognized as partial update.

A simple example is the following statement, which changes the name attribute in a JSON value:

This update can be performed as a partial update because it takes a JSON column (json_col), modifies one of its members, and then stores it into the same column as it reads from (json_col). Additionally, the JSON value being updated must already have a member called name, and there must be enough existing space in the JSON value to hold the new name. If the JSON value being updated is something like {"name": "John", ...}, the storage engine will be told that it can simply overwrite the four bytes in the string “John” with the four bytes in “Knut”, and the rest of the value can be left untouched.

If the JSON value being updated had contained a longer name, like {"name": "Peter", ...}, the update could still be performed as a partial update. In that case, the storage engine would be told to overwrite the first four bytes of “Peter” with “Knut”, and to overwrite the length byte of the name attribute with the value 4 to match the new length.

If, on the other hand, the original JSON value had contained a shorter name, like {"name": "Joe", ...}, there would not be enough room for the new name. In this case, the update will fall back to performing a full update, which means that it writes the whole new JSON value to the database. Similarly, if the original JSON value didn’t have a name attribute at all, a full update would be performed to make room for the new attribute.

Partial update is not limited to the JSON_SET function. It can also be used with the JSON_REPLACE and JSON_REMOVE functions. You can even combine these functions, as long as the input column is the same as the destination column. And you can combine partial update of JSON columns with ordinary updates of other columns in the same statement. Both of these UPDATE statements will attempt to partially update the JSON column:

 

Performance impact

Let’s see how this affects the performance when we update some large JSON values. First, let’s create a table with some big documents:

This creates a table with 16 JSON documents, where each document is approximately 10MB.

Now we try to increment the age attribute in each of the documents with MySQL 5.7:

If we try the same UPDATE statement with MySQL 8.0, we see that it is many times faster:

Although it’s much faster, three seconds for updating a few bytes in 16 documents is still not super fast. The reason is that MySQL 8.0 has binary logging enabled by default, also in non-replicated environments. This means that both the before image and the after image of the big JSON values are written to the binary log. As mentioned earlier, an option was added to make the binary log contain patches instead of the full JSON documents when we have partial updates. Let’s try to enable that option:

That’s more like it. An UPDATE statement that took around 13.5 seconds with MySQL 5.7, takes only 3 seconds with MySQL 8.0 in its default configuration. And with a little bit of extra tuning, we get it down to around a quarter of a second. That’s more than 50 times faster.

Conclusion

MySQL 8.0 speeds up some updates of JSON values, if the updates change small parts of big JSON documents using the JSON_SET, JSON_REPLACE and JSON_REMOVE functions. The speed-up can be seen out of the box. If binary logging is enabled, there are options to make the binary log contain only the changed parts of the JSON values to further improve the update performance.

More details can be found in the Partial Update of JSON Values section of the reference manual.

Thank you for trying MySQL 8.0!

18 thoughts on “Partial update of JSON values

    1. Yes. We do a copy-on-write to ensure that both old and new data is available. I’m planning to write an article on the InnoDB side of things after completing some pending worklogs. That will explain it in more detail.

  1. Hi Knut,

    I guess similar work is done (or is being done) for partial updates of TEXT and BLOB, right ?

    Have you also considered partial updates of VARCHAR and VARBINARY ?

    Thanks.

    1. Hi Jean-François,

      The changes in the storage layer (InnoDB) should work equally well for TEXT/BLOB as for JSON. But at the moment there is no syntax for partially updating TEXT or BLOB columns, so the SQL layer never asks the storage layer to use partial update on such columns. For JSON columns we already had existing syntax for expressing partial updates. There aren’t any immediate plans to add partial update syntax for TEXT and BLOB.

      VARCHAR and VARBINARY haven’t been considered yet. If syntax is added to support TEXT/BLOB, we could evaluate whether it’s worth enabling it for those data types too. I suppose it will depend on how big benefits we’re seeing from partial update on smaller values, and how much has to be changed in the storage layer to support non-BLOB based column types.

Leave a Reply