JSON Labs Release: JSON Functions, Part 1 — Manipulation JSON Data

The MySQL 5.7.7 JSON Lab release introduces a native JSON datatype. See Knut Anders Hatlen’s blog post for more details on this new datatype. In this release we also introduced a number of functions for creating and querying JSON documents. In this post we’ll explore the following new functions related to manipulating JSON documents:

  • jsn_array()
  • jsn_object()
  • jsn_insert()
  • jsn_remove()
  • jsn_set()
  • jsn_replace()
  • jsn_append()
  • jsn_merge()
  • jsn_extract()

Dag Wanvik’s follow up blog post will explore the functions related to querying and searching of JSON documents. More information on all of the new JSON functions can be found in the high level spec attached to WL#7909.

Creating JSON Data

Let’s start out by creating some simple JSON data. Suppose that we were using the JSON datatype to store large numbers of temperature readings coming from smart thermostats. We might find the following tables useful:

Imagine that the ‘capabilities’ document is just an array of tags. Let’s create one of these arrays by using the new JSN_ARRAY() function. This function evaluates a variable length list of expressions and inserts them into a JSON array, one after the other:

That’s easy. Alternatively, we could just insert valid JSON text into a JSON column. MySQL will then parse the text and convert it into the native JSON binary storage format:

In addition to JSN_ARRAY(), there’s also a JSN_OBJECT() constructor function. As you might guess, JSN_OBJECT() builds JSON objects from a variable length list of key/value pairs. Here’s an example:

Again, we can alternatively just put valid JSON text into our table of thermostat readings:

Schema Evolution: Adding Data

Of course, in the real world, data changes over time, in-step with the applications which use it. Suppose that we, the thermostat maker, realize that we can’t tell whether our thermostats are actually on and therefore doing their jobs. Let’s say that we remotely upgrade all of our thermostats so that they start reporting their on/off state. Eventually, our table of thermostat readings will contain old records which don’t report on/off state as well as new records which do:

We now want to fix the old data so that it also contains the on/off state. For our purposes here, it’s ok that the old records say that the on/off state is NULL (meaning “unknown”). We can use the new JSN_INSERT() function to do this. JSN_INSERT() adds missing data to JSON documents. However, it won’t override data that’s already there. JSN_INSERT() takes 3 arguments: the document to be modified, a path expression (see below) locating the part of the document to change, and a value to be inserted at the indicated location:

Note that the old rows say that they don’t know what the on/off state was (NULL value). Also note that the new rows weren’t touched.

Looking more closely at the UPDATE statement, you might wonder:

  1. What’s that strange ‘$.on‘ argument?
  2. Why do I need to CAST my NULL to JSON?

First, let’s talk about the ‘$.on‘ argument. That’s a path expression. It’s the address of a value inside a JSON document. A path expression starts with a $, which indicates the root of the path. For all of the new JSON functions, the root is always indicated by $ and it always means “the selected document”. After the $ come an arbitrarily long sequence of member names and array indexes, which drill down to values nested (possibly deeply) inside the document. Member names are indicated by . and then the name of a key. Array indexes are just numbers enclosed in square brackets. For example, the path expression $.fred.children[3] means “The 3rd child of fred inside the selected document”. Don’t worry too much about this right now though. The path expressions in this blog are all very simple. For example, in our UPDATE statement above, $.on means “The ‘on’ member of the selected document”.

All right, now what about that CAST? Without the CAST, the NULL argument would cause the whole JSN_INSERT() expression to evaluate to NULL. And that would clobber the old rows. Furthermore, CAST(NULL AS JSON) would have the same effect. The problem is that there is a difference between a JSON null literal and a SQL NULL. The way that you create a JSON null literal is to CAST some valid JSON text which contains just the null literal.

Don’t worry if this seems tricky right now. As you play around with the new JSON functions, you will quickly learn how to handle JSON null literals. For the rest of this blog post, we won’t have to bother with SQL NULLs.

Schema Evolution: Removing Data

Any time that you add data, you need to have the option to remove it later on. Suppose that we, the thermostat maker, decide that it was a bad idea to add that on/off state. We’ll need to remotely upgrade all of our thermostats again. We’ll tell them to stop reporting on/off state. Afterward, we’ll want to scrub the on/off cruft from our old thermostat readings. For this purpose, we use the new JSN_REMOVE() function. JSN_REMOVE() takes a document argument along with a path expression locating the data to be removed. Here’s how we remove all of those on/off indicators:

Schema Evolution: Updating Data

Suppose that we, the thermostat maker, upgrade our thermostats so that they contain a new piece of data, the “spread” or difference between the thermostat setting and the actual room temperature. Newer thermostat readings will contain this new data but older readings won’t:

We can use the JSN_SET() function to update the old records. JSN_SET() behaves much like JSN_INSERT() except that it also overrides existing values:

Note that “spread” values are added to records which didn’t have them. And we updated the existing “spread” values. Oops! We didn’t mean to do that! We got the computation backward. We meant to compute “spread” the opposite way, the way that the thermostats compute it, viz., by subtracting the actual temperature from the thermostat setting. Not a problem. We’ll fix that in a moment. But before we do, you might be wondering about that other function which we just introduced, JSN_EXTRACT(). JSN_EXTRACT() returns a value nested inside of a JSON document. JSN_EXTRACT() locates the value by its path address.

OK, now let’s fix our thermostat readings. For this task, we use another update function, JSN_REPLACE(). JSN_REPLACE() behaves the opposite way that JSN_INSERT() does: JSN_REPLACE() won’t touch records with missing data. JSN_REPLACE() only updates records where the data is present:

Schema Evolution: Appending Data

Now suppose that we need to append new data to old data, rather than completely clobbering the old data. Suppose that we, the thermostat maker, have upgraded our thermostats. We’ve added a variable speed fan control to one of the models. After the upgrade, we need to record the new capability in our thermostat metadata. We can use JSN_APPEND() to do this. JSN_APPEND() adds a value to the end of an array:

Now suppose that we further upgrade one of our thermostat models with a package of several security features. Let’s use the new JSN_MERGE() function to correspondingly update our metadata. JSN_MERGE() is a very powerful function. It can merge two arrays together, and that’s what we’re going to do. (It can also merge two objects together, member by member, level by level.) JSN_MERGE() takes a variable length list of JSON documents and it merges them into one composite document:

Schema Evolution: Indexing JSON Data

We started out dumping our thermostat readings onto a heap. Over time, we will want to query these readings. So we’ll want a useful index. Let’s add a virtual column to the thermostat reading table and then index that column. This will allow us to quickly look up thermostat readings based on the thermostat’s unique id:

Please see Jimmy Yang’s blog post for further details on the new “functional index” work that has also been added in the MySQL 5.7.7 JSON Lab release.

Now you’re ready to create and manipulate JSON data! You’re encouraged to go on and read Dag Wanvik’s follow up blog post on this topic. His blog post describes the new query and search related JSON functions introduced by the MySQL 5.7.7 JSON Lab release, namely:

  • jsn_search()
  • jsn_contains()
  • jsn_contains_path()
  • jsn_valid()
  • jsn_type()
  • jsn_keys()
  • jsn_length()
  • jsn_depth()
  • jsn_unquote()
  • jsn_quote()

Please let us know what you think of these new JSON features! We’d love to hear your feedback on what else you’d like to see related to our wider JSON support. If you encounter any problems with these new features, please let us know here in the comments, open a bug report at bugs.mysql.com, or open a support ticket.

Thank you for using MySQL!

About Rick Hillegas

Rick Hillegas has been working in the MySQL Optimizer group since October, 2014. He is a Consulting Member of the Technical Staff, employed by Oracle, based in San Francisco. Before coming to the MySQL group, Rick worked on SQL interpreters for a number of other databases, including Ingres, Sybase, and Cloudscape (aka Apache Derby).

7 thoughts on “JSON Labs Release: JSON Functions, Part 1 — Manipulation JSON Data

Leave a Reply

Your email address will not be published. Required fields are marked *

Please enter * Time limit is exhausted. Please reload CAPTCHA.