In a previous post I took the new JSON datatype and functions for a test drive, using sample data provided by SF OpenData. Today I will extend that example to include indexing.
Introducing Generated Columns
MySQL 5.7 introduces a new feature called generated columns. To start with a simple example:
1 2 3 4 5 |
CREATE TABLE t1 ( id INT NOT NULL PRIMARY KEY auto_increment, my_integer INT, my_integer_plus_one INT AS (my_integer+1) ); |
Here my_integer_plus_one is what is known as a virtual generated column. It exists in meta data only, and does not require any additional storage in memory or in disk for row data. You can think of it as similar to a view or a macro that is generated on demand.
The advantage of creating virtual columns, is that they support indexes. To extend the above example:
1 2 3 4 5 6 |
CREATE TABLE t1 ( id INT NOT NULL PRIMARY KEY auto_increment, my_integer INT, my_integer_plus_one INT AS (my_integer+1), INDEX (my_integer_plus_one) ); |
Generated columns allow MySQL to effectively index on an expression or what is known as a functional index. The index itself is now material and it does exist (as all other indexes do).
Indexing JSON
As the MySQL manual notes:
JSON columns cannot be indexed. You can work around this restriction by creating an index on a generated column that extracts a scalar value from the JSON column.
Lets do exactly that by reusing the example from my previous blog post:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# My original query SELECT * FROM features WHERE feature->"$.properties.STREET" = 'MARKET' # Add a virtual column # This operation is instant, since it only changes meta data ALTER TABLE features ADD feature_street VARCHAR(30) AS (JSON_UNQUOTE(feature->"$.properties.STREET")); Query OK, 0 rows affected (0.01 sec) Records: 0 Duplicates: 0 Warnings: 0 # Add an index # This is an online operation, but may take some time # for a large amount of data ALTER TABLE features ADD INDEX (feature_street); Query OK, 0 rows affected (9.41 sec) Records: 0 Duplicates: 0 Warnings: 0 |
This allows me to then repeat this query by using either the virtual column, or the original expression:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
# Using the JSON extract shorthand operator EXPLAIN SELECT * FROM features WHERE feature->"$.properties.STREET" = 'MARKET'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: features partitions: NULL type: ref possible_keys: feature_street key: feature_street key_len: 33 ref: const rows: 808 filtered: 100.00 Extra: NULL 1 row in set, 1 warning (0.00 sec) # Using the virtual column mysql> EXPLAIN SELECT * FROM features WHERE feature_street = 'MARKET'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: features partitions: NULL type: ref possible_keys: feature_street key: feature_street key_len: 33 ref: const rows: 808 filtered: 100.00 Extra: NULL 1 row in set, 1 warning (0.00 sec) |
It is important to note that as demonstrated in EXPLAIN, both variations of the above query will use an index. That is to say that the optimizer will look for compatible[1] virtual columns with indexes that match the expression in JSON queries.
[1] Note that use of JSON_UNQUOTE is required in the virtual column expression, since the return type of JSON_EXTRACT (and shorthand operator ->) needs to be cast to a string.
Stored Generated Columns
For completeness, it is important to point out that the examples above use the default VIRTUAL variety of generated columns. There also exists a second variety of generated columns that can be STORED in the actual rows of tables:
1 2 |
ALTER TABLE features ADD feature_type VARCHAR(30) AS (JSON_UNQUOTE(feature->"$.type")) STORED; |
STORED generated columns are automatically maintained and kept up to date as modifications to tables are made. It is recommended that you use the VIRTUAL variety of virtual columns unless:
- You need to index a PRIMARY KEY
- You need a FULLTEXT or RTREE index (instead of regular BTREE)
- The virtual column is computationally expensive, and you anticipate large amounts of scanning (i.e. not using indexes)
In the first two cases the STORED variety of generated columns is required, and in the third case it is likely to lead to just faster queries.
Conclusion
Hopefully this demonstrates how virtual columns can be used to add indexes to JSON data. Virtual columns can index other data besides JSON, but this is obviously one of the main use cases. Having the optimizer also understand how to use indexes from virtual columns makes the usage quite flexible too!
Hi
Thank you always for articles.
I translated this article into Japanese for users in Japan.
Translated one is as follows.
https://yakst.com/ja/posts/3841
If there is any problem, please get in touch with me.
Thanks again.
Great article!
What if the property I want to index is inside an array or object? For example, if the JSON property is an array of userIds, and I’m querying for a userId in the array. What would be the best indexing approach?
Hi Glide, the virtual column does need to map to a MySQL data type. So at the moment it is not possible to index a JSON array, but this is something we hope to add in the future.
I’m not sure how practical of a workaround it is, but one thing you could consider is having multiple virtual columns for the nth item of the array, and index this.
Thanks for your response! Always appreciate when the author actually replies.
Another question: I’m trying to evaluate the JSON type by inserting a large collection of JSON (specifically https://github.com/zemirco/sf-city-lots-json).
You can see from the link that it’s a large file with an array of objects . Ideally I’d want to insert each object of the array into a db record.
What’s the easiest way to do so?
Thanks for your response! Always appreciate when the author actually replies.
Another question: I’m trying to evaluate the JSON type by inserting a large collection of JSON (specifically https://github.com/zemirco/sf-city-lots-json).
You can see from the link that it’s a large file with an array of objects . Ideally I’d want to insert each object of the array into a db record.
What’s the easiest way to do so?
My previous post has an example of how to import this exact JSON document 🙂 See : http://mysqlserverteam.com/taking-the-new-mysql-5-7-json-features-for-a-test-drive/ (under ‘Importing Sample Data’)
Hi Morgan, do you know when the ability index JSON arrays will be added, this feature is very important to me?
Hi Michael, this is something we have on our roadmap. I agree with you here – it is very useful.
Hi Morgan, is this still something you are looking to implement and if so when? Thanks 🙂
Thanks for the post.
5.7 being Unable to index Json column is a deal breaker for the seekers of Document Database features.
I am sure, it will be supported in next release*(versions)