GeoJSON Functions

In recent years, GeoJSON has become a popular data format for exchanging GIS data due to several factors. The primary factors being that it’s easy to read, and it’s simple and lightweight. In 5.7.5, we added support for parsing and generating GeoJSON documents via two new functions: ST_GeomFromGeoJson() and ST_AsGeoJson(). These functions makes it easy to connect MySQL with other GeoJSON enabled software and services, such as the Google Maps Javascript API.

Since GeoJSON is a JSON format, we needed a library to parse and write JSON documents. After evaluating several candidates, we ended up with rapidjson due to its features, speed, and compatible license.

The new functions support all of the geometry types specified in the GeoJSON specification, as well as collections. The parsing function also extracts the geometry information from feature objects, as shown below:

When creating GeoJSON documents from spatial data, you have the possibility to specify a maximum number of decimal digits in the output. If GeoJSON output length is a concern for you, then this is a simple and efficient way to reduce the output length if your geometry contains a lot of coordinates with many decimals.

You can even add a URN in the OGC namespace to the GeoJSON output. The function uses the SRID from the geometry input, and outputs a short or long format OGC URN depending the input parameters (note that the output below is formatted for readability):

URNs in the OGC namespace are also recognized by the parsing function, and are stored in the geometry output from the function.

What about Importing GeoJSON with 3D Geometries?

You might think that your lovely collection of GeoJSON documents with 3D geometries are useless with these functions, and that they can’t be imported, but this is exactly what the options argument in ST_GeomFromGeoJson is for. By default, the parsing function will refuse to parse documents with 3D geometries. We do, however, realize that this would reject a lot of existing documents out there, so setting the options argument to 2, 3, or 4 will allow you to import these documents:

Option values of 2, 3, and 4 all have the same effect for now, but in the future when 3D geometries are supported, they will produce different results. We recommend using option 3, since this will mean that documents with 3D geometries will be rejected when 3D is introduced in MySQL. This will ensure that the change of behavior won’t happen silently and that you can take the appropriate action in your applications.

We look forward to your feedback on the new Geohash and GeoJSON features! Please let us know if you have any comments or if you run into any bugs. Thank you for using MySQL!

Geohash Functions

In MySQL 5.7.5, we introduced new functions for encoding and decoding Geohash data. Geohash is a system for encoding and decoding longitude and latitude coordinates in the WGS 84 coordinate system, into a text string. In this blog post we will take a brief look at a simple example to explain just how geohash works.

Where on earth is “u5r2vty0″?

Imagine you get a email from your friend, telling you that there is free food at “u5r2vty0″. But where on earth is “u5r2vty0″?

The first step in converting from “u5r2vty0″ to latitude and longitude data, is decoding the text string into its binary representation. Geohash uses base32 characters, and you can find the character mapping on several online resources. Anyway, the decoded text string is as follows:

“u5r2vty0″ = 11010 00101 10111 00010 11011 11001 11110 00000

The binary digits are then used to repeatedly divide the earth in two, and thus increasing the accuracy with each digit used. Without using any digits, all we know is that the location is somewhere in the range [-180, 180] longitude and [-90, 90] latitude, which is anywhere on the earth! This won’t help you too much in finding the food. Note that latitude 0 is the IERS Reference Meridian, and longitude 0 is equator.

The first digit will halve the longitude, where ‘0’ means discarding the right half and ‘1’ means discarding the left half. In this example, we discard the left half since the first digit is a ‘1’. Now we know that the food is located somewhere in the range [0, 180] longitude and [-90, 90] latitude, and thus you know that you should not be heading towards the Americas.

geohash_first_digit

The next digit halves the latitude, using the same rules. Since the second digit also is a ‘1’, we’ll discard the lower half, and now we are in the range [0, 180] longitude and [0, 90] latitude. We still have few digits left before we know where the free food is, but as you can see the accuracy increases dramatically with the first two digits.

geohash_second_digit

The third digit halves the longitude again, and now you’ll probably get the idea on how decoding a geohash string works. Do you manage to find out where the free food is?

geohash_third_digit

Properties and uses of Geohash

  1. Since a coordinate tuple is represented as a string, it is possible to index and search for geographic data in one-dimensional indexes (B-tree for example).
  2. As you might have figured out from the example, you can see that two geohash strings with the same prefix will be close to each other. This makes proximity searches easy, but remember that the opposite also can be true in some cases; two geohash strings with different prefixes can be located right next to each other. Imagine two locations that are on each side of the IERS Reference Meridian, but only a few meters apart. One geohash string will begin with a ‘0’, and the other will begin with a ‘1’.
  3. The accuracy of the geohash string depends on the number of characters. The more characters you have, the more accurate position you get. Thus you can remove characters from the end to gradually reduce its size and accuracy as required.
  4. It has been proposed to be used for geotagging.
  5. Since similar locations (often) share similar prefixes, it’s simple to spread out geohashes on different partitions, where similar locations are located on the same partitions.

Improvements to STRICT MODE in MySQL

As a part of improving the error handling, in MySQL 5.7.5 we have re-implemented STRICT sql mode.

STRICT mode in MySQL affects the errors that arise from invalid, missing, or out of range values in DML statements such as INSERT, UPDATE, and DELETE. The new implementation aims to make the behavior of STRICT mode more consistent, yet maintain backward compatibility as much as possible.

In MySQL 5.7.5, we have made three general improvements to STRICT mode behavior:

 1. STRICT mode got simpler

It was observed that having a large number of sql modes dependent on STRICT mode creates confusion among users. Specifically, we are talking about NO_ZERO_DATE, NO_ZERO_IN_DATE and ERROR_FOR_DIVISION_BY_ZERO modes. You can se further details on the above sql modes here. These modes only took effect if STRICT mode was also enabled, otherwise they produced warnings.

In MySQL 5.6 we deprecated these three modes and in 5.7.4 we have made their functionality part of STRICT mode itself. Basically the effects of enabling these three modes have been folded into STRICT mode. However, these modes are still retained due to upgrade issues and for backward compatibility. These modes will be completely removed in 5.8. Setting these modes will have no effect from 5.7.4 onwards. More details can be found here: WL#7467. Also you can check the documentation for this task here.

2. STRICT mode got better

It was also observed that the implementation of STRICT mode was not very good, and we got many related bug reports. For example, see Bug #42910 (triggers override strict sql_mode).

The behavior of Stored Routines with respect to STRICT mode was not consistent either. Sometimes, they were allowed to perform actions which are otherwise prohibited in STRICT mode.

The problem was that there was no central place in the code where the statements and errors affected by STRICT mode were handled. The code had the abort_on_warning flag which was switched off and on at various places across the codebase, making the implementation hard to understand and error prone.

In MySQL 5.7.4, we have re-implemented STRICT mode. The usage of the abort_on_warning flag is completely removed. Now, for each statement affected by STRICT mode, we push an error handler which is active during execution of the statement, and pop it when the statement execution is finished. There is also now a central place in the code where all errors that are generated during statement execution, and which are affected by STRICT mode, are handled. For additional information, you can look here: WL#6891.

Following is the list of errors affected by STRICT mode:

  • ER_TRUNCATED_WRONG_VALUE –  “Truncated incorrect value”
  • ER_WRONG_VALUE_FOR_TYPE –  “Incorrect value for function”
  • ER_WARN_DATA_OUT_OF_RANGE –  “Out of range value”
  • ER_TRUNCATED_WRONG_VALUE_FOR_FIELD –  “Incorrect value for column at row”
  • WARN_DATA_TRUNCATED –  “Data truncated for column”
  • ER_DATA_TOO_LONG –  “Data too long for column”
  • ER_BAD_NULL_ERROR –  “Column cannot be null”
  • ER_DIVISION_BY_ZERO –  “Division by 0″
  • ER_NO_DEFAULT_FOR_FIELD –  “Field doesn’t have a default value”
  • ER_NO_DEFAULT_FOR_VIEW_FIELD –  “Field of view underlying table doesn’t have a default value”
  • ER_CUT_VALUE_GROUP_CONCAT –  “Row was cut by GROUP_CONCAT()”
  • ER_DATETIME_FUNCTION_OVERFLOW –  “Datetime function field overflow”
  • ER_WARN_NULL_TO_NOTNULL –  “Column set to default value: NULL supplied to NOT NULL column”
  • ER_WARN_TOO_FEW_RECORDS –  “Row doesn’t contain data for all columns”
  • ER_TOO_LONG_KEY –  “Specified key was too long”
  • ER_WRONG_ARGUMENTS –  “Incorrect arguments”
  • ER_INVALID_ARGUMENT_FOR_LOGARITHM –  “Invalid argument for logarithm”

STRICT mode applies to the following types of DML statements:

  • INSERT
  • UPDATE
  • DELETE
  • LOAD DATA
  • ALTER TABLE
  • INSERT… SELECT
  • CREATE TABLE
  • CREATE INDEX
  • CREATE TABLE… SELECT
  • SELECT sleep()

 3. STRICT mode is the default

We have decided to add STRICT_TRANS_TABLES to the list of default sql modes in MySQL 5.7.5. (WL#7764).

Previously, if the user tried to insert 15 characters into a CHAR(10) column, then by default (STRICT Mode OFF) it would insert the first 10 characters, present the user with a warning, and then throw away the remaining five characters. Now the default behaviour (STRICT Mode ON) will be that the above statement will be rejected with an error.

Lots of changes have been made to existing test cases so that they are run with the new default SQL MODE. Around 500 mtr testcases were updated as part of this work.

WL#7467, WL#6891, WL#7764 were designed and implemented by Raghav Kapoor.

Improvements to the MySQL `IGNORE` Implementation

In 5.7.5, as a part of the larger effort to improve error handling, we re-implemented the IGNORE clause (WL#6614). The IGNORE clause is a MySQL extension to the SQL standard. It affects the errors which occur for each row. The new implementation aims to make the behavior of the IGNORE clause more consistent.

Statements which support the IGNORE clause are:

  • INSERT [ IGNORE ]
  • UPDATE [ IGNORE ]
  • DELETE [ IGNORE ]
  • LOAD DATA [ IGNORE ]
  • LOAD XML [ IGNORE ]
  • CREATE TABLE… [ IGNORE ] SELECT

When the INSERT statement is used to insert a number of rows into a table, an exception during processing would normally abort the statement and return an error message. With the IGNORE keyword, rows that cause certain exceptions are ignored, but the remaining rows are inserted and the execution of the statement is regarded as successful. The same principle applies to UPDATE, DELETE, LOAD, and CREATE…SELECT statements (the IGNORE keyword only affects the DML part of a CREATE…SELECT statement).

The IGNORE clause has two basic functions which we’ll now describe.

Independent of ‘STRICT’ mode

The IGNORE clause downgrades errors to warnings and continues the processing of a statement. The IGNORE keyword affects the following error codes:

  • ER_SUBQUERY_NO_1_ROW : Subquery returned more than 1 row when one is expected
  • ER_ROW_IS_REFERENCED_2 : Foreign key constraint violation in parent table
  • ER_NO_REFERENCED_ROW_2 : Foreign key constraint violation in child table
  • ER_BAD_NULL_ERROR : NOT NULL constraint violation
  • ER_DUP_ENTRY : Unique key constraint violation
  • ER_DUP_ENTRY_WITH_KEY_NAME : Unique key constraint violation
  • ER_DUP_KEY : Unique key constraint violation
  • ER_VIEW_CHECK_FAILED : CHECK constraint for view fails
  • ER_NO_PARTITION_FOR_GIVEN_VALUE : No partition in the table for the given value
  • ER_NO_PARTITION_FOR_GIVEN_VALUE_SILENT : No partition in the table for the existing value
  • ER_ROW_DOES_NOT_MATCH_GIVEN_PARTITION_SET : Row not found matching the given

The row which caused the error is then skipped and the operation continues with the next row. One warning is output for each ignored exception. If two rows violate a UNIQUE constraint, we will get 2 warnings. We do not allow constraints to be broken inside the database.

A Practical Use Case for IGNORE

We have a table ‘t’ created with the following statement:
CREATE TABLE t(a INT) ENGINE = InnoDB;

Let’s look at an example of 3 rows with 1 duplicate among them.

We can not add a UNIQUE constraint directly here, but there is a workaround using the IGNORE clause (when adding a UNIQUE key, we will lose the data found in duplicate rows).

In the case of 3 rows there is little reason to use IGNORE, but if the number of rows is very large IGNORE can be of much help. This simply serves as a simple example.

When ‘STRICT’ Mode is ON

STRICT mode controls how MySQL handles invalid or missing values in data changing statements such as INSERT or UPDATE. With STRICT mode ON, MySQL produces an error for invalid values and aborts the statement. Using the IGNORE keyword in a statement cancels the effects of STRICT mode for that statement. The following error codes are not upgraded from warnings to errors by STRICT mode when the IGNORE clause is used:

  • ER_TRUNCATED_WRONG_VALUE : Truncated incorrect value
  • ER_WRONG_VALUE_FOR_TYPE : Incorrect value for function
  • ER_WARN_DATA_OUT_OF_RANGE : Out of range value
  • ER_DIVISION_BY_ZERO : Division by 0″
  • ER_TRUNCATED_WRONG_VALUE_FOR_FIELD : Incorrect value for column at row
  • WARN_DATA_TRUNCATED : Data truncated for column
  • ER_DATA_TOO_LONG : Data too long for column
  • ER_BAD_NULL_ERROR : Column cannot be null
  • ER_NO_DEFAULT_FOR_FIELD : Field doesn’t have a default value”
  • ER_NO_DEFAULT_FOR_VIEW_FIELD : Field of view underlying table doesn’t have a default value
  • ER_CUT_VALUE_GROUP_CONCAT : Row was cut by GROUP_CONCAT()
  • ER_DATETIME_FUNCTION_OVERFLOW : Datetime function field overflow
  • ER_WARN_TOO_FEW_RECORDS : Row doesn’t contain data for all columns
  • ER_WARN_NULL_TO_NOTNULL : Column set to default value: NULL supplied to NOT NULL column”
  • ER_INVALID_ARGUMENT_FOR_LOGARITHM : Invalid argument for logarithm
  • ER_WRONG_ARGUMENTS : Incorrect arguments

In these cases MySQL inserts the adjusted values for invalid or missing values and produces warnings.

An Example Use Case

Bugs Fixed by This Worklog

  1. Bug#6196: INSERT IGNORE should return warnings
    For the error codes mentioned in case (a) above, there were no warnings before this worklog when a constraint would fail. For example, INSERT IGNORE will silently ignore the duplicate values.
  2. Bug#43895: Multi-DELETE IGNORE does not report warnings
    Same case as mentioned in #1 above.
  3. Bug#68726: Update trigger invoked when update ignore means that no update is performed
    When IGNORE is used with an UPDATE statement, for all supported error codes there will be no error and the statement will be marked as having successfully executed. For the rows which did not get updated, however, the AFTER UPDATE trigger will not be executed.

InnoDB: Supporting Page Sizes of 32k and 64k

In the new InnoDB lab release we support page sizes of 32k and 64k. This gives users even more choices on the page size, allowing you to further customize InnoDB for your particular workload.

There are some things worthy of note related to this new feature:

  1. The extent size changes when the innodb_page_size is set 32k or 64k.
    The extent size is 2M for 32k page sizes, and 4M for 64k page sizes (the extent size is 1M for 4k, 8k, and 16k page sizes). If we do not enlarge the extent size then we will have too many extent headers on the allocation bitmap page, and the bitmap page will overflow.
  2. The innodb_log_buffer_size default value changes when the innodb_page_size is set to 32k or 64k. We change the innodb_log_buffer_size default value from 8M to 16M in this case. If we don’t increase the innodb_log_buffer_size then we may suffer from a lack of free log pages because we have fewer available log buffer pages.
  3. We do not support compression when innodb_page_size is set to 32k or 64k.
    When innodb_page_size is set to 32k or 64k, you will get the following warning if you include a ROW_FORMAT=COMPRESSED or a KEY_BLOCK_SIZE clause in your table DDL:
    Warning 1478 InnoDB: Cannot create a COMPRESSED table when innodb_page_size > 16k
    If innodb_strict_mode is ON, you will instead get the following error:
    Error 1031 Table storage engine for 't1' doesn't have this option
  4. The maximum record size is 16k when innodb_page_size is set to 64k.
    In theory we can store a record whose size is about 50% of the page size. In ROW_FORMAT=REDUNDANT, the record header contains 14-bit pointers to the end of each field, limiting the maximum record size to 16k. For simplicity, we enforce the 16k record size limit on all row formats when innodb_page_size is 64k.
  5. The innodb_page_size setting is global for all tablespaces within a single MySQL server instance.

We look forward to your feedback on this new feature! Please let us if you encounter any issues, or if you have any more general feedback.

Thank you for using MySQL!

InnoDB General Tablespaces – Preview

The new InnoDB Labs release contains the ability to create and use independent multi-table general tablespaces.

This feature will provide a way to group tables together into tablespaces at a location and filename of your choosing.  Tables using row formats of Redundant, Compact, and Dynamic can be combined together into the same general tablespace. Compressed tables with the same key_block_size can also be combined together.

The SQL syntax for creating an empty general tablespaces is:
CREATE TABLESPACE `tblspace_name` ADD DATAFILE 'tablespace.ibd' [FILE_BLOCK_SIZE=n];

The filename can contain an absolute path or a path relative to the data directory where the InnoDB system tablespace (ibdata1) is found. The file in the example above would be created in the same directory as the InnoDB system tablespace. The file name extension ‘.ibd’ is required on all datafile names.

The FILE_BLOCK_SIZE parameter is only necessary if you want to create a general tablespace that holds compressed tables. The value can be specified in bytes or using the “k” shorthand notation, for example: 4096 or 4k.

A table can be added to a general table space with:
CREATE TABLE tbl_name TABLESPACE=`tblspace_name`;

Or it can be moved to a general tablespace  with:
ALTER TABLE tbl_name TABLESPACE=`tblspace_name`;

And finally, an empty tablespace can be deleted with:
DROP TABLESPACE `tblspace_name`;

This labs release also demonstrates the ability to move tables between the three types of InnoDB tablespaces — the system tablespace, file-per-table tablespaces, and general tablespaces — using the ALTER TABLE tbl_name TABLESPACE='tblspace_name'; statement.  If the specified tblspace_name value is “innodb_system” then the table will be re-created within the system tablespace.  If the specified tblspace_name value is “innodb_file_per_table”, then the table will be re-created as a single-table tablespace.  Thus you have full flexibility to move tables around within your database.

Note that a general tablespace is not associated with any particular database/schema like a table is.  So it is possible to create tables in multiple database schemas using the same general tablespace, just as it has always been with the system tablespace. And while DROP DATABASE might drop all of the individual tables found within a general tablespace, it will not drop the tablespace itself.

We look forward to getting your input on this new feature! Please let us know if you encounter any issues or have any general feedback. Thank you for using MySQL!

Removing Scalability Bottlenecks in the Metadata Locking and THR_LOCK Subsystems in MySQL 5.7

The MySQL Server 5.7.5 Development Milestone Release, which was published recently, contains some significant changes to the metadata locking (MDL) subsystem and to the usage of the THR_LOCK manager for InnoDB tables. This post provides more information about these changes, which resulted in nice improvements in both scalability and performance.

Sometime during the development cycle of MySQL 5.6 we realized that locks used in the metadata locking subsystem (MDL) implementation can become a scalability bottleneck when many short statements were executed in @autocommit=1 mode against InnoDB tables.

Bug #66473 is a good example (8-table Sysbench POINT_SELECT/InnoDB test). In this particular case the bottleneck was the mutex protecting the MDL_map hash which contained the MDL_lock objects describing the individual metadata locks. To some extent, we were able to solve the problem for this specific case by partitioning the MDL_map hash and the mutex protecting it in MySQL 5.6.8.

But this solution was not without its own problems — see bug #68487 (fixed in 5.6.15). Moreover, it didn’t solve the problem for the case when most of the statements accessed the same table (e.g. 1-table Sysbench POINT_SELECT/InnoDB test). This problem became even more prominent in 5.7 after some improvements were made in the InnoDB storage engine. After these changes we started to see the rwlock protecting the individual metadata lock (MDL_lock::m_rwlock) as the hottest lock in the server for such workloads.

So at some point during the Summer of 2013 we realized that a more drastic solution was needed.

We started by experimenting with a patch which disabled most of the metadata locking subsystem and thus removed the bottlenecks associated with it (and broke correctness in a big way). We immediately saw that the bottleneck moved to the THR_LOCK table-level lock manager, and the THR_LOCK::mutex in particular. Another patch was drafted that disabled THR_LOCK locking for InnoDB tables (which also broke correctness). With both these patches applied we saw the potential for significant scalability/performance improvements.

After these experiments it became obvious that we needed to address issues both in the metadata locking (MDL) subsystem and the THR_LOCK lock manager in order to improve scalability and performance:

  1. For MDL we decided to use the fact that workloads where this subsystem can become a bottleneck mostly involve DML statements. DDL statements are rare in them. DDL-specific metadata locks are likely to conflict with other metadata locks, while DML-specific metadata locks don’t conflict with each other. So it is possible to improve scalability by making acquisition of DML-specific locks cheap (e.g. it can be as simple as checking a flag and increasing a counter) and making acquisition of DDL-specific locks a bit more expensive. WL#7304 “Improve MDL performance and scalability by implementing ‘fast-path’ for DML locks” implements exactly this idea. Essentially, it decreases the size of the critical section associated with MDL_lock::m_rwlock for DML-specific metadata locks at the expense of DDL-specific locks, and thus makes MDL_Lock::m_rwlock less hot.Moreover, we found that it was possible to develop this idea further and implement acquisition of DML-specific metadata locks as a lock-free atomic operation by using atomics support and the lock-free hash implementation already present in MySQL Server (and used in the Performance Schema). This resulted in WL#7305 “Improve MDL scalability by using lock-free hash” and WL#7306 “Improve MDL performance and scalability by implementing lock-free lock acquisition for DML”.After these changes, acquisition of a DML-specific lock consists of a look-up in a lock-free hash and then an atomic compare-and-swap operation on a single 64-bit value. This value contains counts of the various types of DML-specific locks acquired and flags indicating the presence of pending/active DDL-specific locks. If there are pending or active DDL-specific locks, we have to resort to the old approach using MDL_lock::m_rwlock lock. Similarly, the release of a DML-specific lock is in the best case a single atomic compare-and-swap operation which decrements one of these counters (for all of the details, please read the above WLs).

    As a result of this change, MDL_lock::m_rwlock became insignificant in our benchmarks — we no longer see it in the list of hot locks, and MDL_map::m_lock/ MDL_map_partition::m_lock was removed completely as we now use a single lock-free hash.

    One of the interesting consequences of WL#7305 was that we had to change the handling of unused MDL_lock objects (these are objects representing metadata locks which are not acquired by anyone). In 5.6 such objects are simply moved to an LRU list of unused MDL_lock objects. If the list gets bigger than --metadata_locks_cache_size elements, the least recently used object is deleted. With the lock-free algorithm, deletion becomes more expensive, so some kind of caching policy was definitely needed, but implementing an LRU list gets complicated. So instead of deleting the least recently used element we delete a random unused element once the unused/used ratio reaches a certain threshold. This meant that the --metadata_locks_cache_size option no longer made sense so it was removed.

    Since MDL_map is now implemented using a single lock-free hash instead of several hash partitions, the --metadata_locks_cache_size option was removed as well.

  2. The situation with the THR_LOCK manager looked a bit simpler. InnoDB doesn’t really need/rely on THR_LOCK locks for the execution of most of the statements. Instead InnoDB relies on its own row-level locks (for DML) and on metadata locks (for DDL and DML/DDL conflicts). The only exception was the LOCK TABLES READ statement for which acquisition of a strong THR_LOCK lock was necessary to properly isolate it from any concurrent DML. Once a new type of metadata lock was introduced, to replace THR_LOCK lock in this role, we were able to remove THR_LOCK acquisition for InnoDB tables completely. Doing this was not as easy as it might sound since there were various questions to answer:
    • What priority should the new lock get compared to other DDL and DML locks?
    • How do we prevent DML from starving out LOCK TABLES READ?
    • What should we do about the LOW_PRIORITY clause?

    As a result we ended-up adding even more lock types to the metadata locking subsystem. See WL#6671 “Improve scalability by not using thr_lock.c locks for InnoDB tables” for details.

    While doing the above we were not able to completely avoid introducing incompatible changes. One thing, which I would like to note in particular, is that multi-update now is blocked by concurrent LOCK TABLES READ on any table in its join, even if such table is only used for reading.

    On the other hand, the introduction of a new metadata lock type for LOCK TABLES READ allowed us to fix the long-standing bug #57474 “Deadlock when DDL under LOCK TABLES WRITE, READ + PREPARE” as a bonus.

The above changes removed the metadata locking subsystem and the THR_LOCK lock manager from the list of scalability bottlenecks for scenarios involving InnoDB tables. They became part of the MySQL 5.7.5 DMR (some of the changes were already available in the MySQL 5.7.4 DMR).

They are also partially responsible for the impressive numbers which Dimitri Kravtchuk got in his benchmarks. See his recent blog post: “Indeed, MySQL 5.7 rocks : OLTP_RO Point-Selects 8-tables Benchmark“.  Dimitri also presented these numbers at Oracle OpenWorld 2014  “MySQL 5.7: Performance and Scalability Benchmarks“, and that link contains a link to his presentation.

A big Thank you! goes out to all who participated/helped with this work:

  • Mark Callaghan for reporting bug #66473, issues which led to the reporting of bug #68487 and urging us to look at scalability bottlenecks in the MDL subsystem
  • Dimitri Kravtchuk for his benchmarks and help in identifying the problems
  • Mikael Ronstrom for discussions about possible solutions to the problem
  • Jon Olav Hauglid for reviewing the design and the code of these worklogs
  • Shipra Jain, Vince Rezula, and Jeb Miller for QAing them
  • And last but not least, to Paul Dubois for documenting changes caused by these WLs

Thank you for using MySQL!

New InnoDB Memcached Functionality in 5.7 Lab Release

InnoDB Memcached’s read only query performance in 5.7 has already achieved a remarkable 1.1 million QPS record. Now, the read only query bottleneck shifts to the memcached client itself. Thus anything that can batch the queries and shorten the query strings helps.

Multiple Get

In the new 5.7 InnoDB Lab Release, we add support for the “multi-get” option within InnoDB Memcached. This functionality enables users to send multiple keys in a single “get” command. In this way, for client-server communication, the package size is reduced as multiple keys are package within a single “get” call. For InnoDB, it will no longer need to start a transaction and open the table for each key if they are packaged together. The lookup of a key and its corresponding values is done by a single open table within a single transaction. It is now similar to a “select … where … in()” query.

Here’s a quick example showing how to retrieve 4 key/value pairs in a single get call:

Notice that it returns all 4 key/value pairs in a single response.

For this feature, the performance evaluation has yet to be conducted. We will blog more about that once the performance results are available.

Range Queries with InnoDB Memcached

Another interesting memcached feature added in this Lab release is that memcached now supports range queries. This is a good example to show how flexible the InnoDB API and InnoDB Memcached can be.

Initially we support a single compare operator placed directly before a key, and prefixed with a special marker of “@”.

For example, if we have following table:

Then in order to get all results that are less than “H”, the following query can be issued:

To include values with “H”, you can use “<=”:

To find rows larger than “B”:

In this Lab release, we only support single key point range searches. It can be easily extended moving forward, however, to range searches bounded by both sides (key1 < key < key2 etc.).

Summary

By supporting simple range searches we demonstrate the flexibility and versatility of the InnoDB APIs and InnoDB Memcached. In addition, it shows that InnoDB Memcached can take full advantage of the storage engine’s indexes and support functionalities beyond traditional memcached capabilities, all with superior performance.

Please let us know what you think of this new feature! We’d love to hear your feedback on what else you’d like to see in the InnoDB Memcached plugin moving forward. If you encounter any problems with the new feature, please let us know here in the comments, open a bug report at bugs.mysql.com, or open a support ticket. Thank you for using MySQL!

Back home from MySQL Central @ OpenWorld

Back home after an excellent week at MySQL Central  in San Francisco. I want to thank the MySQL Community for all your questions, suggestions, and discussions. Your knowledge about – and passion for MySQL is stunning. Thank you!

It was also good fun, see pictures and comments #MySQLCentral. Presentations are available online, for example my own What’s New in MySQL 5.7? [CON2314] and Alexander Nozdrin’s  New Data Dictionary: An Internal Server API That Matters [CON4478] .

And don’t miss the Keynote  Oracle MySQL Session 2014 — Screven and Ulin.

See you all next year!

 

Generated Columns in MySQL 5.7.5

Generated Columns is a new feature available in the latest lab release. This work is based on a contribution by Andrey Zhakov. Thanks, Andrey! The Optimizer team modified it to follow the current MySQL design, and to lift a number of limitations.

The syntax is:

<type>   [ GENERATED ALWAYS ]   AS   ( <expression> )   [ VIRTUAL|STORED ]
[ UNIQUE [KEY] ]   [ [PRIMARY] KEY ]   [ NOT NULL ]   [ COMMENT <text> ]

There are two kinds of Generated Columns: virtual (default) and stored. Virtual means that the column will be calculated on the fly when a record is read from a table. Stored means that the column will be calculated when a new record is written in the table, and after that it will be treated as a regular field. Both types can have NOT NULL restrictions, but only a stored Generated Column can be be a part of an index.

There are still a few limitations on Generated Columns:

  • The generation expression used can only call native deterministic functions; stored routines and UDFs are not yet supported.
  • The length of the generation expressions for a table are subject to the limitations of the .frm file, and thus they can’t be longer than 64K in total. So you can have one field with a generation expression length of 64K, or you can have 30 fields with an average expression length of 2K each.
  • A Generated Column can’t refer to itself or to other Generated Columns that are later defined, but it can refer to any previously defined Generated Column. This limitation does not apply to regular columns. A Generated Column can refer to any regular one, no matter where it’s defined.

What can Generated Columns can be used for? Many things. To name few:

  • As a materialized cache for often used expressions:

    The result of those two queries is exactly the same, but in the first one ExtractValue(doc,’/user/username’) will be evaluated 3 times per record read, while in the second only once per record read. If the “username” columns would be defined as STORED then the generation expression will be evaluated only when a record is inserted or updated.

    Another similar case is that Generated Columns (GC) could be used to add flexibility by replacing often used expressions with a GC. For example, if you have a bunch of applications that work on the same database then it might be practical to have a unified way to access data without need to keep all apps in sync.

  • Providing indexes for joins with non-relational data:

    Here ref access over an index is used to access a table with XML data.

  • Working around a limited set of partitioning functions:

    Note that only one partition is going to be scanned due to partition pruning.

As always, there are more than one way to do things. Generated Columns adds yet another means of solving a variety of interesting challenges. It now also becomes more convenient to deal with derivatives of relational and non-relational data. We look forward to seeing all of the interesting ways that you apply the feature!