Building a Better CREATE USER Command

Prior to MySQL 5.7, the CREATE USER command had a number of limitations:

  • No way to set both authentication plugin and password
  • No way to disable a user
  • No way to define user resource limitations
  • No way to set a non-default password expiration policy
  • No way to require SSL/x509

All of these things could be done through other means, but typically involved other statements, such as GRANT commands. Starting with MySQL 5.7.6, these can all be done through a new and improved CREATE USER syntax.

Continue reading Building a Better CREATE USER Command

Removal and Deprecation in MySQL 5.7

With the shipment of the first release candidate (RC) of MySQL 5.7, the next major version of the server is rapidly shaping up. Over the course of the nearly two and a half years that have passed since 5.6 went GA, we have put a lot of work into streamlining the server code in order to ease the burden of developing and maintaining such a large product and codebase.

An important aspect of this work is deprecation and removal. To explain the terms, deprecating a feature means that we signal to the outside world that “this feature is available now, but it will be removed in a future release, so please adjust your use case accordingly”. Removing a feature means just that – in one version the feature is available, but then it is gone, and if you try to use it, you will get an error message saying that the feature is unknown.

Generally we don’t remove features in existing GA releases, but we reserve the right to deprecate features that we will later remove in a development release (which is often the upcoming GA).

Earlier, the approach here was quite conservative: broadly speaking, the rule was to put in deprecation warnings in version N, and only remove in version N+2. However, in 5.7, we have seen a more proactive attitude towards both deprecation and removal, in that we in some cases have created deprecation warnings for features in 5.6 point releases and then removed the features in 5.7.

This blog post gives a summary of all the deprecations and removals that have taken place in MySQL 5.7. Most of these have already been covered in the various milestone blog posts that Geir Høydalsvik has published.

Features that have been removed

InnoDB monitoring features

The magic of the built-in InnoDB table names: innodb_monitor, innodb_lock_monitor, innodb_tablespace_monitor, innodb_table_monitor, and innodb_mem_validate have been removed. Historically, CREATE TABLE on one of these names (inside any database schema) has set flags inside InnoDB to enable additional debug output to the error log. DROP TABLE has cleared the flags and stopped the output. This mechanism was awkward, and it was difficult to maintain with current and future refactoring plans in InnoDB, so it was removed. Going forward,

  1. Instead of innodb_monitor, one can use SET GLOBAL innodb_status_output=ON|OFF
  2. Instead of innodb_lock_monitor, use
    • SET GLOBAL innodb_status_output=ON|OFF
    • SET GLOBAL innodb_status_output_locks=ON|OFF
  3. There is no direct replacement for the two tables innodb_tablespace_monitor and innodb_table_monitor, but information is available in INFORMATION_SCHEMA.SYS_TABLES and INFORMATION_SCHEMA.SYS_TABLESPACES. We also plan to add more information to INFORMATION_SCHEMA.FILES.
  4. innodb_mem_validate was completely removed, as the functionality was only present when the UNIV_MEM_DEBUG flag was set at compile time, something which is not even done for normal debug builds.

(Remove innodb_*monitor tables in 5.7, WL#7377, see also Morgan Tocker’s blog post)

Old Passwords

In MySQL 5.6 we deprecated passwords that used the older pre-4.1 password hashing format. Support for these passwords was removed in 5.7:

  • The server-side mysql_old_password authentication plugin is removed. Accounts that use this plugin are disabled at startup and the server writes an “unknown plugin” message to the error log.
  • The client-side mysql_old_password authentication plugin is removed from the C client library.
  • The --secure-auth option to the server and client programs is the default, but is now a no-op. It is deprecated and will be removed in a future MySQL release.
  • The --skip-secure-auth option to the server and client programs is no longer supported and using it produces an error.
  • The secure_auth system variable permits only a value of 1; a value of 0 is no longer permitted.
  • For the old_passwords system variable, a value of 1 (produce pre-4.1 hashes) is no longer permitted.
  • The OLD_PASSWORD() function is removed.

(Remove the deprecated old_password plugin, WL#8006, see also Morgan Tocker’s blog post)

SQL syntax

In 5.7 we have removed IGNORE for ALTER TABLE. We made this decision after Morgan’s blog post seeking community advice on this topic. We decided to remove IGNORE because it caused problems with foreign keys (rows removed in parent table), prohibits true online CREATE UNIQUE INDEX, and caused problems for replication. (Remove IGNORE for ALTER TABLE in 5.7, WL#7395)

The SQL modes, ERROR_FOR_DIVISION_BY_ZERO, NO_ZERO_DATE, NO_ZERO_IN_DATE SQL MODES are folded into STRICT MODE in 5.7. Unfortunately, as reported by Simon Mudd in bug#75439, this change breaks replication, and we are currently reconsidering this decision. (Remove ERROR_FOR_DIVISION_BY_ZERO, NO_ZERO_DATE, NO_ZERO_IN_DATE SQL MODES and make their functionality part of STRICT MODE in 5.7, WL#7467, see also Morgan Tocker’s blog post)

We removed the support for INSERT DELAYED in 5.7 and onwards. INSERT DELAYED has a lot of limitations and issues attached to it, and with InnoDB as default engine it becomes even less relevant. We decided to keep the INSERT/REPLACE DELAYED syntax, but convert it to INSERT/REPLACE with a warning before execution. (Remove INSERT DELAYED, WL#6073)

In 5.7, we removed all remnants of YEAR(2) related code. Issues with the YEAR(2) data type include ambiguity of displayed values, and possible loss of information when values are dumped and reloaded or converted to strings. Overall we removed the support for YEAR(2) in 5.6, but we kept some limited support for old YEAR(2) columns to allow for upgrade procedures to MySQL 5.6. Even this limited support is now gone in 5.7. (Remove limited YEAR(2) support, WL#6263)

Options and Variables

We have deprecated unique option prefixes in 5.5 and 5.6, and from 5.7 we only accept full option names. The old behavior has been found to be confusing and non-future proof. For example, the --compress option could be given to mysqldump as --compr, but not as --comp because the latter was ambiguous. However, adding extra options through plugins could cause prefixes in scripts to be unreliable even across server restarts. (Deprecate Unique Option Prefixes, WL#6978)

The timed_mutexes system variable was deprecated in 5.5/5.6 and removed in 5.7. This system variable had already been disconnected from any underlying InnoDB code, so it no longer served any purpose. (Deprecate and remove the timed_mutexes system variable, WL#7436)

The server options which disable InnoDB have been removed. The system tables are being moved from MyISAM to InnoDB, so in 5.7+ it will not be possible to run the MySQL server without InnoDB (which is the default storage engine since 5.5). (Deprecate skip-innodb in 5.6 and remove it in 5.7, WL#7976)

The variables innodb_use_sys_malloc and innodb_additional_mem_pool_size system variable were removed in 5.7 because it simplified InnoDB scalability efforts – the InnoDB home grown memory manager was not as good as jemalloc or tcmalloc, since the latter two are more scalable. (Remove innodb_use_sys_malloc & innodb_additional_mem_ pool_size system variable in 5.7, WL#7628)

The innodb_mirrored_log_groups system variable was deprecated in 5.6 and removed in 5.7. The allowed range of the parameter innodb_mirrored_log_groups was 1..10, but there has always been (at least since 3.23.53) a check within InnoDB startup that refuses any value except 1. This is not meaningful and thus we decided to remove it. (Remove innodb_mirrored_log_groups, WL#6808)

The innodb_file_io_threads system variable, which was deprecated in 5.5, was removed in 5.7 (Remove the innodb_file_io_threads variable in 5.7, WL#7149)

The storage_engine system variable was deprecated in 5.5 and finally removed in 5.7. You should instead use the default_storage_engine system variable. (Remove the storage_engine system variable in 5.7, WL#7148)

The --basedir and --datadir options to mysql_upgrade were no longer in use, and they have been removed (WL#7010).

Outdated Command Line Programs

This work was triggered by a blog post from my colleague Todd Farmer, with the title Spring cleaning useless clients, where he asked about the relevance of a number of clients that we shipped with the MySQL Server.

As a consequence, and after some more feedback from a blog post by Morgan Tocker, the following clients were deprecated in 5.6 and have been removed in 5.7: mysqlaccess (WL#7620), mysql_convert_table_format (WL#7034), mysql_fix_extensions (WL#7036), mysql_find_rows.sh (WL#7621), mysql_setpermission (WL#7033), msql2mysql (WL#7035), mysqlbug (WL#7689), mysql_zap and mysql_waitpid (WL#7826), mysqlhotcopy (WL#7854)

Various Removals

The old custom Windows symlink implementation has been removed in MySQL 5.7. Starting from Windows Vista/Server 2008, a native symlink mechanism is supported by Windows (see the mklink command). MySQL will now use the native Windows mechanism going forward. (Remove support for .sym files, WL#6561)

Features that have been deprecated (but not removed)

In 5.7, we have replaced the EOF packet with the new OK packet in the MySQL client-server protocol. The EOF and OK packets serve the same purpose — to mark the end of a query execution result. The protocol is kept backward compatible as clients and servers have one additional capability flag to denote their support for the new OK packet. When this capability flag is in place, any SQL statement which produces a result set will send an OK packet as a marker to indicate the end of data rather than an EOF packet. (Deprecate the EOF packet, WL#7766, see also Morgan Tocker’s blog post)

In 5.6 we deprecated SHOW PROFILE in favor of the Performance Schema instrumentation. In 5.7, we have completed this work by deprecating the INFORMATION_SCHEMA.PROFILING table. See also Bug#67525, reported by Peter Laursen. (Deprecate the INFORMATION_SCHEMA.PROFILING Table, WL#6802)

As part of the increased GIS effort, we’ve started cleaning up the name space for GIS functions. This is done by deprecating, removing, and adding function names (aliases) in order to make the naming scheme consistent and to bring MySQL in line with the OGC standard and other major DBMSs.

With the exception of the constructor functions that have the same names as the data types (Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon and GeometryCollection) and functions that do calculations on minimum bounding rectangles (MBRs), all spatial functions now have names prefixed by the standard “ST_” (spatio-temporal) convention, and function names without that prefix have been deprecated. This also means that it is a lot easier to take your queries from other SQL/MM compliant database systems and run them on your MySQL databases. (Consistent naming scheme for GIS functions, WL#8055, see also the blog post from Erik Frøseth, Spring cleaning in the GIS namespace)

The ENCRYPT(), DES_ENCRYPT() and DES_DECRYPT() functions based on the Data Encryption Standard (DES) have been deprecated in favor of Advanced Encryption Standard (AES) based functions, because AES provides much better security. (Deprecate the ENCRYPT, DES_ENCRYPT and DES_DECRYPT functions, WL#8126)

We now trigger a warning (ER_WARN_DEPRECATED_SYNTAX) upon the execution of the ENCODE() or DECODE() functions. (Deprecate ENCODE()/DECODE() in 5.7, WL#6984)

We have deprecated the SET PASSWORD syntax and the PASSWORD() function. The existing ALTER USER statement is modified to cover the deprecated functionality. The PASSWORD() function was originally introduced as a way to manually update the mysql.user table. This is generally a bad idea and we want to leave the task of managing user authentication properties exclusively to using the ALTER USER statement which automatically determines which authentication plugin should be used and then adjusts the password algorithm accordingly. (Deprecate PASSWORD() and extend ALTER USER syntax to manage authentication attributes, WL#6409)

We have deprecated the following 4 InnoDB parameters: innodb_large_prefix, innodb_file_format, innodb_file_format_check and innodb_file_format_max.

The parameter innodb_large_prefix was introduced in MySQL 5.5, so that users could avoid accidentally creating tables that would be incompatible with the InnoDB Plugin in MySQL 5.1. Now that MySQL 5.1 is not supported any more, it makes no sense to use any other value than innodb_large_prefix = ON.

The innodb_file_format family of parameters only made sense for the InnoDB Plugin in MySQL 5.1 when it was distributed separately from MySQL. Its purpose was to allow users to downgrade to the built-in InnoDB in MySQL 5.1, to avoid using incompatible format when testing the 5.1 plugin. Since MySQL 5.1, the InnoDB formats have changed, but the innodb_file_format has not been bumped. (InnoDB: Change defaults and deprecate settings in next 5.7 release, WL#7703)

In 5.7, the EXTENDED and PARTITIONS keyword of the EXPLAIN syntax have been deprecated. Currently, EXPLAIN has EXTENDED and PARTITIONS flags that show additional info compared to regular EXPLAIN output. However, EXPLAIN JSON already behaves like those flags and prints all available information. After deprecation regular EXPLAIN will behave exactly as if those flags are given – all additional columns and warnings are printed automatically. (Deprecate EXTENDED and PARTITIONS flags of EXPLAIN, WL#7027, see also Morgan Tocker’s blog post)

We have deprecated the collation_database and character_set_database system variables in 5.7. (Deprecate GLOBAL @@collation_database, @@character_set_database; assignment of SESSION counterparts, WL#3811, see also Bug#35357, Bug#27208, and Bug#27687 as well as Morgan Tocker’s blog post)

The conversion of pre MySQL 5.1 encoded database/schema names has been deprecated in 5.7. (Deprecate the conversion of pre MySQL 5.1 encoded database names, WL#8186).

The sync_frm system variable in 5.7 has been deprecated in 5.7 (Deprecate and remove the sync_frm sysvar, WL#8216)

The @@session.gtid_executed system variable has been deprecated in 5.7. This variable contains the set of GTIDs currently stored in the transaction cache. However, its value can be completely determined from the value of the system variable gtid_next, so it is of limited practical value. Also, it is not very well named, and it was decided to deprecate it. (Deprecate @@session.gtid_executed, WL#7518)

Getting Started With MySQL & JSON on Windows

MySQL is getting native support for JSON.  This blog post will show you how to quickly get the MySQL server with these new features running on your Windows rig and how to write a small C# program in Visual Studio 2015 that stores a JSON document using the new native JSON data type.

Schema or Schemaless

The upcoming 5.7 version of MySQL introduces a ton of new features, some of which I am quite excited about—in particular the native JSON support and Generated Columns. I think these can be very powerful for rapid development where the database is constantly changing as I add features and refactor existing ones. With the introduction of the new JSON data type I can combine schema and schemaless data.

Schema bound or schemaless with with ACID features, combined with transparent and easy data model changes? Yes, thank you, both please.

 

Installing Docker on Windows

The first step is to get the JSON labs release version of the MySQL server running on our computer. The easiest way is to have Docker do this for us. If you already have Docker installed (or Boot2Docker) then you can skip this step.

 

Installing the MySQL JSON Labs Release

Now that we have Docker installed, lets start it up. Simply double click the new desktop icon. This will then take us into a shell.

boot2docker_running

 

Now install the MySQL JSON Labs release using some Docker magic:

You can then use this command to verify that the server is up and running:

Which should read something like this:

 

Connecting to the Server Using the MySQL Client

Let’s then start a bash process inside the same container that the new MySQL server is running in. This will give us access to the mysql command-line client which we can then use to interact with the MySQL server.

At the Docker prompt enter docker exec -it ml bash, this give us a bash shell inside the container. From here enter mysql -p to connect to the MySQL server. The password is ‘my-secret-pw‘ or whatever you chose at install.

 

Trying out the New Native JSON Type

All righty then, finally we can play around with the new JSON type. It has a lot more to offer than simply storing JSON data, heck one could do that in a TEXT/BLOB or VARCHAR/CHAR type. The JSON data type enforces validation and fast look ups of sub-elements inside of the JSON document.

 

 

Exposing the Server Port to the OS

We now have a MySQL server running on a virtual Linux server inside VirtualBox. The problem is that this MySQL server is living inside a docker container completely cut off from any outside access, but we want to interact with it from the our Windows host. So, what we need to do is to tell Docker to expose the server port to VirtualBox, and then tell VirtualBox to expose that port to Windows.

We already did the first step by using the “-p 3306:3306” parameter when we installed the mysql/mysql-labs:5.7.7-json container.  Read more about docker run -p if you are interested in knowing all of the gory details.

The next step requires starting VirtualBox.  We have exposed the server port in Docker to VirtualBox, now we need to tell VirtualBox to pass that port along to Windows. You should have a start icon on the desktop after installing Boot2Docker.

Go to Settings -> Network -> Port forwarding and add the following rule:
blog-mysql-vbox-port-forward

This is a good time to open a command shell and check if you see port 3306 being listened on.

 

Create a Project in Visual Studio 2015

So in case you haven’t been paying attention—Microsoft has released a free and fully featured version of their IDE which I recommend you get if you’re even remotely interested in developing on the Windows platform (or even for mobile and Web development).

Head over to www.visualstudio.com and grab it.  I use the community edition.

Once you have it installed, then create a new project called JsonTest (for example), and to keep it simple choose a project of type ‘Console Application’.

 

Add MySQL Connector/Net to the Project

Now we want to communicate with the MySQL server that is running in our Docker container from our C# program, so what we need is MySQL Connector/Net, this is a pure C# library for exactly this purpose.

  1. Connector/Net ReferenceIn Solution Explorer, right click on ‘References’ to bring up the menu
  2. Choose “Manage nuget packages” from the menu.
  3. Type in ‘mysql’ in the search bar
  4. Choose the ‘MySql.Data ADO.Net driver for MySQL’.
  5. Press the install button

 

C# Code for Using the New JSON Data Type

This is a simple example program of how to create a table with a JSON type column, INSERT a couple of JSON Documents, and then finally to read them back from the database.

Notice line 36, here I do a small trick to avoid getting MySqlException ‘Unknown data type’, because this is a Labs release that contains functionality which is slightly ahead of the current generation of connectors. So to work around the fact that the connector is clueless about the new JSON type, I CAST the type to a CHAR(40) column in the SELECT statement.  

Copy & paste the program below into your own project and hit CTRL-F5 to run it:

You should then get the following result:

As you can see, we’ve successfully created a table in MySQL 5.7 which has a JSON column, we’ve then INSERTED two JSON Documents into that table, and finally we’ve queried the table to get JSON Documents back.

That’s it for now. I hope that this brief introduction was helpful! Please stay tuned as we add additional helpful information regarding working with JSON in MySQL 5.7. As always, THANK YOU for using MySQL!

Improving the Performance of MySQL on Windows

In this blog entry I’d like to describe how you might be able to improve how MySQL performs on Windows by ensuring that you take advantage of a Windows specific configuration setting.

On Unix systems, MySQL programs treat the localhost host name specially. For connections to localhost, MySQL programs attempt to connect to the local server by using a Unix socket file, which has some performance advantages over a TCP/IP connection. Windows does not support Unix sockets, however, and hence does not benefit from this optimisation.

However, the use of shared memory connections on Windows can offer significant performance improvements over the use of TCP/IP connections. Shared memory connections are obviously only useful when both the MySQL client and server processes are executing on the same machine, but when they are the performance benefits of using shared memory connections can be helpful. To enable shared memory connections, you would use the shared_memory system variable.

The following screenshot compares the sysbench OLTP read write test performance when using shared memory connections versus when using TCP connections with MySQL 5.6.24 on Windows Server 2012 R2. The graph shows the average number of transactions per second (TPS) measured by sysbench at a variety of thread counts. The TPS value averaged over five runs is plotted at each thread count, with each run taking 300 seconds.

Graph comparing Shared Memory and TCP connection performance using MySQL 5.6.24
Comparing shared memory and TCP connection performance using MySQL 5.6.24

The uppermost red plot in the graph above shows the results obtained when using shared memory connections between sysbench and MySQL. The lower black plot shows the results obtained when using TCP connections.

Note that as well as giving improved performance when using shared memory connections, the shared memory connections results show reduced variability (apart from the results at 512 threads).

Obviously, changing to a more efficient communication path between the MySQL server and client will only make a dramatic difference like that shown above when the MySQL server isn’t spending a lot of time processing queries and doing file I/O. So if you use MySQL on Windows in a standalone configuration (with the application clients and the MySQL server on the same machine) and switch from using TCP connections to shared memory connections, then the performance improvement that you experience may not be as great.

The test machine used to produce the results shown above is a Sun Fire X4170 M2 Server with 24 logical CPUs running at 2930MHz.

The following graph shows the results of running the same TCP vs shared memory connection tests using the MySQL 5.7.7 release candidate:

Graph showing shared Memory vs TCP connection performance using MySQL 5.7.7rc
Comparing shared memory and TCP connection performance using MySQL 5.7.7rc

Note that the MySQL 5.7.7 release candidate shows better performance at high thread counts than MySQL 5.6.24 in the graphs above. Profiling the shared memory test runs at 1024 threads for the different versions of MySQL reveals that the second “hottest” function in MySQL 5.6.24 at this thread count is RtlpEnterCriticalSectionContended, whereas this function has dropped to fourth place in the MySQL 5.7.7 profile. A call to RtlpEnterCriticalSectionContended on Windows indicates contention for a critical section (a multithreading lock).

The reduction in lock contention with MySQL 5.7.7 fits nicely with the performance and scalability improvements in MySQL 5.7 mentioned in Geir’s blog entry: What’s new in MySQL 5.7 (First Release Candidate).

The tests used to produce the results shown above are part of the regular performance tests that run daily in the Oracle test labs against the latest builds of MySQL on various platforms. The results of these daily performance tests are automatically checked for performance regressions and failures and engineers are alerted if any are found.

One of the characteristics of a good performance test is repeatability. In order to reduce the variability of the TPS results reported by sysbench on the machine used for the test results, the sysbench process was affinitized to use CPUs 0,1,2,3,12,13,14,15 and the MySQLD process was affinitized to use CPUs 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23. These CPU affinities were arrived at by experimenting to find the affinity settings that provided the highest and most stable TPS results.

Microsoft has also introduced an enhancement to the “loopback” TCP/IP performance on Windows 8 and Windows 2012 via the “TCP Loopback Fast Path” (see http://blogs.technet.com/b/wincat/archive/2012/12/05/fast-tcp-loopback-performance-and-low-latency-with-windows-server-2012-tcp-loopback-fast-path.aspx ). The graph below shows the results I obtained when I experimented with a patch on the communications library used by MySQL 5.7.6 m16 and sysbench that set the SIO_LOOPBACK_FAST_PATH option to 1.

Graph showing Fast Loopback TCP vs standard TCP connection performance using MySQL 5.7.6-m16
Comparing Fast Loopback TCP and standard TCP connection performance using MySQL 5.7.6-m16

The lower black plot shows the results from the “normal” unpatched TCP connections. The upper red plot shows the results when using the communications library patched with the “Loopback Fast Path” setting.

Note that the performance gain from using the “TCP Loopback Fast Path” patch over normal TCP connections is not as significant as that available from switching to shared memory connections. The data in the graph above is also not directly comparable with the previous graphs showing the shared memory connection performance as the CPU affinity settings differ.

In my next post, I hope to show how some previously undocumented settings can further improve MySQL’s performance on Windows. So please stay tuned!

As always, thank you for using MySQL!

The InnoDB Change Buffer

One of the challenges in storage engine design is random I/O during a write operation. In InnoDB, a table will have one clustered index and zero or more secondary indexes.  Each of these indexes is a B-tree.  When a record is inserted into a table, the record is first inserted into clustered index and then into each of the secondary indexes.  So, the resulting I/O operation will be randomly distributed across the disk.  The I/O pattern is similarly random for update and delete operations. To mitigate this problem, the InnoDB storage engine uses a special data structure called the change buffer (previously known as the insert buffer, which is while you will see ibuf and IBUF used in various internal names).

The change buffer is another B-tree, with the ability to hold the record of any secondary index.  It is also referred to as a universal tree in the source code.  There is only one change buffer within InnoDB and it is persisted in the system tablespace.  The root page of this change buffer tree is fixed at FSP_IBUF_TREE_ROOT_PAGE_NO (which is equal to 4) in the system tablespace (which has space id of 0). When the server is started, the change buffer tree is loaded by making use of this fixed page number.   You can refer to the ibuf_init_at_db_start() function for further details.

The total size of the change buffer is configurable and is designed to ensure that the complete change buffer tree can reside in main memory.   The size of the change buffer is configured using the innodb_change_buffer_max_size system variable.

Overview of Change Buffering

Change buffering is applicable only to non-unique secondary indexes (NUSI).  InnoDB buffers 3 types of operations on NUSI: insert, delete marking, and delete.  These operations are enumerated by ibuf_op_t within InnoDB:

One important point to remember is that the change buffering is leaf page oriented. A particular operation to NUSI is buffered in the change buffer only if the relevant non-root leaf page of the NUSI is not already available in the buffer pool.  This means that the buffered change is predefined to happen in a particular leaf page of a NUSI within the InnoDB system.  This makes it necessary to track the free space available in the NUSI leaf pages.  This tracking is necessary because merging these buffered operations to the NUSI leaf page must not result in a B-tree page split or B-tree page merge.

Special Change Buffer Fields

When a NUSI record is buffered in the change buffer, 4 special change buffer fields are added to the beginnning of the NUSI record.  Each of these 4 fields and their contents are explained below.  The primary key of the change buffer tree is then {space_id, page_no, count}, where the count helps to maintain the order in which the change is buffered for that particular page.  The change buffer row format has evolved over a period of time and the following table provides information for MySQL 5.5+:

Field NumberMacro NameField LengthField Description
1IBUF_REC_FIELD_SPACE4 bytesThe identifier of the space in which NUSI exists.
2IBUF_REC_FIELD_MARKER1 bytesIndicates whether the change buffer row format is old or new. If present (and zero), row format is newer (MySQL 4.1+)
3IBUF_REC_FIELD_PAGE4 bytesThe leaf page number of the NUSI to which the buffered row belongs.
4IBUF_REC_FIELD_METADATA2 bytesCounter field, used to sort records within a (space_id, page_no) in the order they were added.
1 byteOperation type (ibuf_op_t).
1 byteRow format flag. If 0, the user index record is in REDUNDANT row format. If IBUF_REC_COMPACT, then user index record is in COMPACT row format.
Variable length. DATA_NEW_ORDER_NULL_TYPE_BUF_SIZE (which is 6) bytes per field.Type information affecting the alphabetical ordering of the fields and the storage size of an SQL NULL value.
5IBUF_REC_FIELD_USERthe first user field.

The row format of the change buffer records themselves are always REDUNDANT.

Change Buffer Bitmap Page

The free space information for each page is tracked in predefined pages called the change buffer bitmap page, which is also known as the ibuf bitmap page.  These pages always follow the extent descriptor pages.  The following table gives the predefined page numbers of the change buffer bitmap pages:

Page Size in Kilobytes (KB)Extent Size in PagesExtent Descriptor Page NumbersChange Buffer Bitmap Pages (ibuf bitmap pages)Number of pages described by one ibuf bitmap page
42560, 4096, 8192, 12288, ...1, 4097, 8193, 12289, ...4096
81280, 8192,16384, 24576, ...1, 8193, 16385, 24577, ...8192
16640, 16384, 32768, 49152, ...1, 16385, 32769, 49153, ...16384
32640, 32768, 65536, ...1, 32769, 65537, ...32768
64640, 65536, 131072, ...1, 65537, 131073, ...65536

The page number 1 is also referred to as the FSP_IBUF_BITMAP_OFFSET. These change buffer bitmap pages help to answer the following questions quickly:

  • Does the given page have any buffered changes in the ibuf (insert/change buffer) tree?  This question will be asked when a page is read into the buffer pool.  The buffered changes will be merged to the actual page before putting it into the buffer pool.
  • Does the given page have enough free space so that a change can be buffered? This question will be asked when we want to modify a leaf page of NUSI and it is not already available in the buffer pool.

In the next section we will see at the information stored in the ibuf bitmap page which helps to answer above questions.

Information Stored in Change Buffer Bitmap Page

The change buffer bitmap page uses 4 bits (IBUF_BITS_PER_PAGE) to describe each page. It contains an array of such 4 bits describing each of the pages.  This whole array is called the “ibuf bitmap” (insert/change buffer bitmap).  This array begins after the page header at an offset equal to IBUF_BITMAP (which is equal to 94). Given a page number, the ibuf bitmap page that contains the 4 bit information on the given page can be calculated as: ulint bitmap_page_no = FSP_IBUF_BITMAP_OFFSET + ((page_no / page_size) * page_size); .

You can refer to the ibuf_bitmap_page_no_calc() function for more details on the complete calculation.  Likewise, given a page number, the offset within the change buffer bitmap page contains the 4 bits that can be used easily for the calculations.  I leave this as an exercise to the reader (refer to function ibuf_bitmap_page_get_bits_low() for further info). The following table provides details about these 4 bits:

BitValueDescription
IBUF_BITMAP_FREE0The first two bits are used to represent the free space available in the leaf page of the NUSI.
IBUF_BITMAP_BUFFERED2The third bit if set means that the leaf page has buffered entries in the change buffer.
IBUF_BITMAP_IBUF3The fourth bit if set means that this page is part of the change buffer.

This means that only 2 bits are available to store the free space information of the page.  There are only 4 possible values: 0, 1, 2, 3.  Using these 2 bits, we try to encode the free space information for a page.  The rule is as follows — there must be at least UNIV_PAGE_SIZE / IBUF_PAGE_SIZE_PER_FREE_SPACE bytes of free space for the change buffer to be used:

Tracking Free Space of Pages

Before an insert operation (IBUF_OP_INSERT) is buffered, the free space available in the target NUSI leaf page is approximately calculated using the information available in the change buffer bitmap page. This conversion is done in the ibuf_index_page_calc_free_from_bits() function and the formula used is:

The following table provides the conversions done from the encoded value found within the change buffer bitmap page to a meaningful value in bytes:

Page Size in Kilo Bytes (KB)Free Space Information from IBUF bitmap page
(ibuf_code)
The free space available in NUSI leaf page (approximately assumed)
400 bytes
1128 bytes
2256 bytes
3512 bytes
1600 bytes
1512 bytes
21024 bytes
32048 bytes

Using this information, we can determine if the record to be buffered will fit into the page or not.  If there is enough space then the insert will be buffered.  Using this approach, we ensure that merging these records to the target NUSI will not result in a page split.

Updating Free Space Information

After buffering an insert or delete operation, the free space information in the change buffer bitmap page must be updated accordingly (a delete mark operation will not change the free space information).  To update the free space information we need to convert the free space in bytes back to the IBUF encoded value.  This is done in the ibuf_index_page_calc_free_bits() function using the following formula:

In the above formula, max_ins_size is the maximum insert size (maximum free space) available in the page after page re-organization.

Record Count in NUSI Leaf Pages

In the case of a purge operation  (IBUF_OP_DELETE),  we need to ensure that the number of records in the target NUSI leaf page doesn’t go to zero because in InnoDB, the leaf pages of B-tree are not allowed to become empty.  They must have at least 1 record. Since the number of records in the target NUSI page is unknown (because it is not loaded into the buffer pool yet), the buffered insert operations are taken into account.  If 1 insert operation is buffered then it is assumed that the target NUSI page has 1 record, and if 2 insert operations are buffered then it is assumed that the target NUSI page has 2 records and so on.  In this way the number of records in the target NUSI page is calculated.  Based on this calculated record count, a purge operation is either buffered or not.  This means that if there are no buffered insert or delete-mark operations, then no purge operations can be buffered.

Merging the Buffered Changes to Target NUSI Page

The changes to NUSI leaf pages are buffered in the change buffer if the NUSI leaf page is not already in the buffer pool. These buffered operations are merged back into the actual NUSI leaf page under various circumstances:

  1. When these NUSI leaf pages are subsequently read into the buffer pool, the buffered operations will be merged.  A NUSI leaf page could be read into the buffer pool during an index lookup, an index scan or because of read ahead.
  2. When the master thread of InnoDB periodically does change buffer merges by calling ibuf_merge_in_background().
  3. When there are too many operations buffered for a particular NUSI leaf page.
  4. When the change buffer tree reaches its maximum allowed size.

The change buffer merge operation is initiated by calling the ibuf_merge_in_background() or ibuf_contract() functions.   The change buffer merges are done either in the foreground or in the background. The foreground change buffer merges are done as part of a DML operation and hence will affect the performance experienced by the end user.  The background change buffer merges are instead done periodically when there is less activity in the server.

We ensure that the change buffer merge does not result in a B-tree page split or page merge operation. It also shouldn’t result in an empty leaf page.  Before the target NUSI leaf page are placed into the buffer pool, the buffered changes are applied to them. Once the buffered changes are merged for a page, its associated 4 bits of information in the change buffer bitmap page are also updated.

Conclusion

This article provided an overview of the change buffer subsystem of InnoDB.  It explained the additional fields that are added to a secondary index record before storing it within the change buffer.  It provided information about how the change buffer keeps track of free space information of the NUSI leaf pages by making use of the change buffer bitmap page.  It also explained the need to calculate the number of records in the NUSI leaf pages so that it doesn’t become empty.

Thanks to Marko Makela for reviewing this article and helping to make it more accurate. If you have any questions, please feel free to post a comment below.

That’s all for now. THANK YOU for using MySQL!

 

Using Perl and MySQL to Automatically Respond to Retweets on Twitter

In my previous post, I showed you a way to store tweets in MySQL, and then use Perl to automatically publish them on Twitter.

In this post, we will look at automatically sending a “thank you” to people who retweet your tweets — and we will be using Perl and MySQL again.

Just like in the first post, you will need to register your application with Twitter via apps.twitter.com, and obtain the following:

One caveat: Twitter has a rate limit on how often you may connect with your application — depending upon what you are trying to do. See the Rate Limiting and Rate Limits API docs for more information. What this means is that if you are going to put this into a cron job (or use some other automated scheduler), I wouldn’t run it more than once every 15 minutes or so.

We will also be using the same tables we created in the first posttweets and history — as well as a new table named retweets. The retweets table will contain all of the user names and tweet ID’s for those retweets we have discovered and for which we’ve already sent a “thank you” tweet response.

The Perl script will connect to your tweet history table and retrieve a set of your tweet ID’s, with the most recent tweet first. The script will then connect to Twitter and check to see if there are any retweets for each given ID. If an existing retweet is found, then the script will check your retweets table to see if you have already thanked the user for their retweet. If this is a new retweet, then the script will connect to Twitter, send the “thank you” message to that user, and finally insert the user name and tweet ID combination into the retweets table. This will ensure that you do not send repeat “thank you” responses.

Here is a flow chart to help explain what the script does:

We will be using the retweets(id) API call to see if a tweet ID was retweeted, and then we will send the “thank you” tweet via the update API call. You can find more information about the Perl Twitter module at Net::Twitter::Lite::WithAPIv1_1.

First we will need to create the retweets table, where we will store the information about our tweets that were retweeted. Here is the CREATE TABLE statement for the retweets table:

Then you will need to edit the below script and insert the consumer_key, consumer_secret, access_token, and access_token_secret values for your application (which you get when registering your app with Twitter), and edit the accessTweets file (more on that shortly) used by the ConnectToMySql subroutine. (You may also want to comment-out the debug style “print” calls).

In the ConnectToMySql subroutine, I store the MySQL login credentials in a text file one directory below where my Perl script is located. This accessTweets file should contain the following information:

I tested this on two Twitter accounts and everything worked well for me, but please let me know if you have any problems. Please keep in mind, however, that I am not the best Perl programmer, nor am I an expert on the Twitter API, so there are likely better/easier way to do this.

That’s it for now. I hope that this was interesting and may prove helpful. THANK YOU for using MySQL!

 


Tony Darnell is a Principal Sales Consultant for MySQL, a division of Oracle, Inc. MySQL is the world’s most popular open-source database program. Tony may be reached at info [at] ScriptingMySQL.com and on LinkedIn.

Using Perl to Send Tweets Stored in a MySQL Database to Twitter

Using twitter can sometimes feel like driving downtown, screaming what you want to say out the window, and hoping someone hears you. There might be tens of thousands of people downtown, but your message will only be heard by a few. Because of this, your best bet is to repeat your message as often as possible.

Twitter is free and if you want to reach as many people as possible, it’s another great tool for getting your message out there. But sending tweets on a scheduled basis can be a pain. There are client programs available which allow you to schedule your tweets (Hootsuite is one I have used in the past). You can load your tweets in the morning, and have the application tweet for you all day long. But you still have to load the application with your tweets—one by one.

A friend of mine asked me if there was a way to send the same 200 tweets over and over again, spaced out every 20 minutes or so. He has a consulting business, and just wants to build up a list of twitter followers by tweeting inspirational quotes. If he tweets for twenty hours a day, and sends three quotes an hour, it will take him a little more than three days to burn through his 200 quotes. And he can always add more quotes or space out the tweets as necessary. I decided to write a Perl script to do this for him.

To start, we will need a MySQL database to store the tweets. I use the MySQL Workbench product as my client application for connecting to MySQL. From within Workbench, I can create my tweet database:

I will then need a table inside my database to store my tweets.

The tweet messages will be stored in the tweet column, and the last date the tweet was sent will have a time stamp in the last_tweet_date column. When we perform a search to find the next tweet, we will simply sort our search by the last_tweet_date and then id, and limit our output to one tweet. After we send the tweet, we will update the last_tweet_date column and send that tweet to the end of the line. The script will also incrementally change the tweet_count value (number of times the tweet has been sent), and record the length of the tweet in tweet_length. I do not do any error checking in the script to make sure the tweet was sent, but errors are printed.

We now need to insert some tweets into our table. Since my friend is going to be sending inspirational quotes, I found a few I can import. In order to make it easier for importing, all single quote marks () will be replaced by the carat symbol (^). I can then swap these symbols inside the Perl script. You could use the backslash (\) before the single quote, but I prefer a single character substitution so I know how long the tweet will be.

I will also use the tilde (~) as a way to designate a carriage return in my tweet. The Perl script will replace the tilde with a carriage return (\n). Two tildes give me two carriage returns and a blank line.

I also created a history table to store the tweet identification numbers. Each tweet is assigned a unique number by twitter, and this is how you can access this tweet. I save this information so I can delete the tweets later using this number. I have included a short script for deleting tweets near the end of this post.

You will need to register your application with twitter via apps.twitter.com, and obtain the following:

You will also need to register your mobile phone in order to link your twitter account to your application. I have not figured out how to use this script with someone else’s account, as the instructions for scripting Perl for use with twitter are not very thorough. I will try to add this at a later date.

Now that you have your application information and all of your tables created with data inserted, here is the Perl script for sending tweets. (You will need to install the necessary Perl modules that are used.)

In the subroutine ConnectToMySql, I store the MySQL login credentials in a text file one directory below where my Perl script is located. This file contains the following information:

You can instead include your information inside the file if you prefer.

If you want to delete your tweets, you can create a script to access the tweets in your history table, and then delete them one at a time. Here is an example without the database connections:

Be sure to replace the value of

with the value from the tweet you want to delete.

That’s it for now. Hopefully this was fun, interesting, and even useful! Thank you for using MySQL!

 


Tony Darnell is a Principal Sales Consultant for MySQL, a division of Oracle, Inc. MySQL is the world’s most popular open-source database program. Tony may be reached at info [at] ScriptingMySQL.com and on LinkedIn.

Creating and Restoring Database Backups With mysqldump and MySQL Enterprise Backup – Part 2 of 2

In part one of this post, I gave you a couple examples of how to backup your MySQL databases using mysqldump. In part two, I will show you how to use the MySQL Enterprise Backup (MEB) to create a full and partial backup.


MySQL Enterprise Backup provides enterprise-grade backup and recovery for MySQL. It delivers hot, online, non-blocking backups on multiple platforms including Linux, Windows, Mac & Solaris. To learn more, you may download a whitepaper on MEB.

MySQL Enterprise Backup delivers:

  • NEW! Continuous monitoring – Monitor the progress and disk space usage
  • “Hot” Online Backups – Backups take place entirely online, without interrupting MySQL transactions
  • High Performance – Save time with faster backup and recovery
  • Incremental Backup – Backup only data that has changed since the last backup
  • Partial Backup – Target particular tables or tablespaces
  • Compression – Cut costs by reducing storage requirements up to 90%
  • Backup to Tape – Stream backup to tape or other media management solutions
  • Fast Recovery – Get servers back online and create replicated servers
  • Point-in-Time Recovery (PITR) – Recover to a specific transaction
  • Partial restore – Recover targeted tables or tablespaces
  • Restore to a separate location – Rapidly create clones for fast replication setup
  • Reduce Failures – Use a proven high quality solution from the developers of MySQL
  • Multi-platform – Backup and Restore on Linux, Windows, Mac & Solaris(from http://www.mysql.com/products/enterprise/backup.htmlWhile mysqldump is free to use, MEB is part of MySQL’s Enterprise Edition (EE) – so you need a license to use it. But if you are using MySQL in a production environment, you might want to look at EE, as:MySQL Enterprise Edition includes the most comprehensive set of advanced features, management tools and technical support to achieve the highest levels of MySQL scalability, security, reliability, and uptime. It reduces the risk, cost, and complexity in developing, deploying, and managing business-critical MySQL applications.
    (from: http://www.mysql.com/products/enterprise/)
    Before using MEB and backing up your database for the first time, you will need some information:

    Information to gather – Where to Find It – How It Is Used

    • Path to MySQL configuration file – Default system locations, hardcoded application default locations, or from –defaults-file option in mysqld startup script. – This is the preferred way to convey database configuration information to the mysqlbackup command, using the –defaults-file option. When connection and data layout information is available from the configuration file, you can skip most of the other choices listed below.
    • MySQL port – MySQL configuration file or mysqld startup script. Used to connect to the database instance during backup operations. Specified via the –port option of mysqlbackup. –port is not needed if available from MySQL configuration file. Not needed when doing an offline (cold) backup, which works directly on the files using OS-level file permissions.
    • Path to MySQL data directory – MySQL configuration file or mysqld startup script. – Used to retrieve files from the database instance during backup operations, and to copy files back to the database instance during restore operations. Automatically retrieved from database connection for hot and warm backups. Taken from MySQL configuration file for cold backups.
    • ID and password of privileged MySQL user – You record this during installation of your own databases, or get it from the DBA when backing up databases you do not own. Not needed when doing an offline (cold) backup, which works directly on the files using OS-level file permissions. For cold backups, you log in as an administrative user. – Specified via the –password option of the mysqlbackup. Prompted from the terminal if the –password option is present without the password argument.
    • Path under which to store backup data – You choose this. See Section 3.1.3, of the MySQL online manual – “Designate a Location for Backup Data” for details. – By default, this directory must be empty for mysqlbackup to write data into it, to avoid overwriting old backups or mixing up data from different backups. Use the –with-timestamp option to automatically create a subdirectory with a unique name, when storing multiple sets of backup data under the same main directory.
    • Owner and permission information for backed-up files (for Linux, Unix, and OS X systems) – In the MySQL data directory. – If you do the backup using a different OS user ID or a different umask setting than applies to the original files, you might need to run commands such as chown and chmod on the backup data. See Section A.1, of the MySQL online manual – “Limitations of mysqlbackup Command” for details.
    • Size of InnoDB redo log files – Calculated from the values of the innodb_log_file_size and innodb_log_files_in_group configuration variables. Use the technique explained for the –incremental-with-redo-log-only option. – Only needed if you perform incremental backups using the –incremental-with-redo-log-only option rather than the –incremental option. The size of the InnoDB redo log and the rate of generation for redo data dictate how often you must perform incremental backups.
    • Rate at which redo data is generated – Calculated from the values of the InnoDB logical sequence number at different points in time. Use the technique explained for the –incremental-with-redo-log-only option. – Only needed if you perform incremental backups using the –incremental-with-redo-log-only option rather than the –incremental option. The size of the InnoDB redo log and the rate of generation for redo data dictate how often you must perform incremental backups.For most backup operations, the mysqlbackup command connects to the MySQL server through –user and –password options. If you aren’t going to use the root user, then you will need to create a separate user. Follow these instructions for setting the proper permissions.All backup-related operations either create new files or reference existing files underneath a specified directory that holds backup data. Choose this directory in advance, on a file system with sufficient storage. (It could even be remotely mounted from a different server.) You specify the path to this directory with the –backup-dir option for many invocations of the mysqlbackup command.Once you establish a regular backup schedule with automated jobs, it is preferable to keep each backup within a timestamped subdirectory underneath the main backup directory. To make the mysqlbackup command create these subdirectories automatically, specify the –with-timestamp option each time you run mysqlbackup.For one-time backup operations, for example when cloning a database to set up a replication slave, you might specify a new directory each time, or specify the –force option of mysqlbackup to overwrite older backup files.

      If you haven’t downloaded and installed mysqlbackup, you may download it from edelivery.oracle.com (registration is required). Install the MySQL Enterprise Backup product on each database server whose contents you intend to back up. You perform all backup and restore operations locally, by running the mysqlbackup command on the same server as the MySQL instance.

      Now that we have gathered all of the required information and installed mysqlbackup, let’s run a simple and easy backup of the entire database. I installed MEB in my /usr/local directory, so I am including the full path of mysqlbackup. I am using the backup-and-apply-log option, which combines the –backup and the –apply-log options into one. The –backup option performs the initial phase of a backup. The second phase is performed later by running mysqlbackup again with the –apply-log option, which brings the InnoDB tables in the backup up-to-date, including any changes made to the data while the backup was running.

      Now, I can take a look at the backup file that was created:

      As you can see, the backup was created in /Users/tonydarnell/hotbackups. If I wanted to have a unique folder for this backup, I can use the –with-timestamp.

      The –with-timestamp option places the backup in a subdirectory created under the directory you specified above. The name of the backup subdirectory is formed from the date and the clock time of the backup run.

      I will run the same backup command again, but with the –with-timestamp option:

      (I am not going to duplicate the entire output – but I will only show you the output where it creates the sub-directory under /Users/tonydarnell/hotbackups)

      So, I ran the backup again to get a unique directory. Instead of the backup files/directories being placed in /Users/tonydarnell/hotbackups, it created a sub-directory with a timestamp for the directory name:

      Note: If you don’t use the backup-and-apply-log option you will need to read this: Immediately after the backup job completes, the backup files might not be in a consistent state, because data could be inserted, updated, or deleted while the backup is running. These initial backup files are known as the raw backup.

      You must update the backup files so that they reflect the state of the database corresponding to a specific InnoDB log sequence number. (The same kind of operation as crash recovery.) When this step is complete, these final files are known as the prepared backup.

      During the backup, mysqlbackup copies the accumulated InnoDB log to a file called ibbackup_logfile. This log file is used to “roll forward” the backed-up data files, so that every page in the data files corresponds to the same log sequence number of the InnoDB log. This phase also creates new ib_logfiles that correspond to the data files.

      The mysqlbackup option for turning a raw backup into a prepared backup is apply-log. You can run this step on the same database server where you did the backup, or transfer the raw backup files to a different system first, to limit the CPU and storage overhead on the database server.

      Note: Since the apply-log operation does not modify any of the original files in the backup, nothing is lost if the operation fails for some reason (for example, insufficient disk space). After fixing the problem, you can safely retry apply-log and by specifying the –force option, which allows the data and log files created by the failed apply-log operation to be overwritten.

      For simple backups (without compression or incremental backup), you can combine the initial backup and the apply-log step using the option backup-and-apply-log.

      MEB 3.9 and later creates two .cnf files based on the output of SHOW GLOBAL VARIABLES: server-my.cnf (non-default values) and server-all.cnf (all values).

      Now that we have a completed backup, we are going to copy the backup files and the my.cnf file over to a different server to restore the databases. We will be using a server that was setup as a slave server to the server where the backup occurred. If you need to restore the backup to the same server, you will need to refer to the mysqlbackup manual. I copied the backup files as well as the my.cnf file to the new server:

      On the new server (where I will restore the data), I shutdown the mysqld process (mysqladmin -uroot -p shutdown), copied the my.cnf file to the proper directory, and now I can restore the database to the new server, using the copy-back option. The copy-back option requires the database server to be already shut down, then copies the data files, logs, and other backed-up files from the backup directory back to their original locations, and performs any required postprocessing on them.

      I can now restart MySQL. I have a very small database (less than 50 megabytes). But it took less than a minute to restore the database. If I had to rebuild my database using mysqldump, it would take a lot longer. If you have a very large database, the different in using mysqlbackup and mysqldump could be in hours. For example, a 32-gig database with 33 tables takes about eight minutes to restore with mysqlbackup. Restoring the same database with a mysqldump file takes over two hours.

      An easy way to check to see if the databases match (assuming that I haven’t added any new records in any of the original databases – which I haven’t), I can use one of the MySQL Utilities – mysqldbcompare. I wrote about how to do this in an earlier blog about using it to test two replicated databases, but it will work here as well – see Using MySQL Utilities Workbench Script mysqldbcompare To Compare Two Databases In Replication.

      The mysqldbcompare utility “compares the objects and data from two databases to find differences. It identifies objects having different definitions in the two databases and presents them in a diff-style format of choice. Differences in the data are shown using a similar diff-style format. Changed or missing rows are shown in a standard format of GRID, CSV, TAB, or VERTICAL.” (from: mysqldbcompare — Compare Two Databases and Identify Differences)

      Some of the syntax may have changed for mysqldbcompare since I wrote that blog, so you will need to reference the help notes for mysqldbcompare. You would need to run this for each of your databases.

      You can try and run this for the mysql database, but you may get a few errors regarding the mysql.backup_history and mysql.backup_progress tables:

      For example, when you compare the mysql.backup_history tables, the original database will have two entries – as I ran mysqlbackup twice. But the second backup entry doesn’t get entered until after the backup has occurred, and it isn’t reflected in the backup files.

      Original Server

      Restored Server

      For the mysql.backup_progress tables, the original database has ten rows, while the restored database has seven.

      There are many options for using mysqlbackup, including (but not limited to) incremental backup, partial backup , compression, backup to tape, point-in-time recovery (PITR), partial restore, etc. If you are running MySQL in a production environment, then you should look at MySQL Enterprise Edition, which includes MySQL Enterprise Backup. Of course, you should always have a backup and recovery plan in place. Finally, if and when possible, practice restoring your backup on a regular basis, to make sure that if your server crashes, you can restore your database quickly.

       


      Tony Darnell is a Principal Sales Consultant for MySQL, a division of Oracle, Inc. MySQL is the world’s most popular open-source database program. Tony may be reached at info [at] ScriptingMySQL.com and on LinkedIn.
      Tony is the author of Twenty Forty-Four: The League of Patriots

      Visit http://2044thebook.com for more information.

Creating and Restoring Database Backups With mysqldump and MySQL Enterprise Backup – Part 1 of 2

Part 1 of 2: (part two)
If you have used MySQL for a while, you have probably used mysqldump to backup your database. In part one of this blog, I am going to show you how to create a simple full and partial backup using mysqldump. In part two, I will show you how to use MySQL Enterprise Backup (which is the successor to the InnoDB Hot Backup product). MySQL Enterprise Backup allows you to backup your database while it is online and it keeps the database available to users during backup operations (you don’t have to take the database offline or lock any databases/tables – but to do this, you need to use the –no-locking option).

This post will deal with mysqldump. For those of you that aren’t familiar with mysqldump:

The mysqldump client is a utility that performs logical backups, producing a set of SQL statements that can be run to reproduce the original schema objects, table data, or both. It dumps one or more MySQL database for backup or transfer to another SQL server. The mysqldump command can also generate output in CSV, other delimited text, or XML format.

The best feature about mysqldump is that it is easy to use. The main problem with using mysqldump occurs when you need to restore a database. When you execute mysqldump, the database backup (output) is an SQL file that contains all of the necessary SQL statements to restore the database – but restoring requires that you execute these SQL statements to essentially rebuild the database. Since you are recreating your database, the tables and all of your data from this file, the restoration procedure can take a long time to execute if you have a very large database.

There are a lot of features and options with mysqldump – (a complete list is here). I won’t review all of the features, but I will explain some of the ones that I use.

Here is the command to use mysqldump to simply backup all of your databases (assuming you have InnoDB tables). This command will create a dump (backup) file named all_databases.sql.

After you hit return, you will have to enter your password. You can include the password after the –password option (example: –password=my_password), but this is less secure and you will get the following error:

Here is some information about the options that were used:

If you are using Global Transaction Identifier’s (GTID’s) with InnoDB (GTID’s aren’t available with MyISAM), you will want to use the –set-gtid-purged=OFF option. Then you would issue this command:

Otherwise you will see this error:

You can also execute a partial backup of all of your databases. This example will be a partial backup because I am not going to backup the default databases for MySQL (which are created during installation) – mysql, test, PERFORMANCE_SCHEMA and INFORMATION_SCHEMA

Note: mysqldump does not dump the INFORMATION_SCHEMA database by default. To dump INFORMATION_SCHEMA, name it explicitly on the command line and also use the –skip-lock-tables option.

mysqldump never dumps the performance_schema database.

mysqldump also does not dump the MySQL Cluster ndbinfo information database.

Before MySQL 5.6.6, mysqldump does not dump the general_log or slow_query_log tables for dumps of the mysql database. As of 5.6.6, the dump includes statements to recreate those tables so that they are not missing after reloading the dump file. Log table contents are not dumped.

If you encounter problems backing up views due to insufficient privileges, see Section E.5, “Restrictions on Views” for a workaround.
(from: http://dev.mysql.com/doc/refman/5.6/en/mysqldump.html)

To do a partial backup, you will need a list of the databases that you want to backup. You may retrieve a list of all of the databases by simply executing the SHOW DATABASES command from a mysql prompt:

In this example, since I don’t want to backup the default mysql databases, I am only going to backup the comicbookdb, coupons, scripts and watchdb databases. I am going to use the following options:

Here is the command that I will run from a prompt:

I will need to enter my password on the command line. After the backup has completed, if your backup file isn’t too large, you can open it and see the actual SQL statements that will be used if you decide that you need to recreate the database(s). If you accidentally dump all of the databases into one file, and you want to separate the dump file into smaller files, see my post on using Perl to split the dump file.

For example, here is the section of the dump file (partial_database_backup.db) for the comicbookdb database (without the table definitions). (I omitted the headers from the dump file.)

If you are using the dump file to create a slave server, you can use the –master-data option, which includes the CHANGE MASTER information, which looks like this:

If you used the –set-gtid-purged=OFF option, you would see the value of the Global Transaction Identifier’s (GTID’s):

You may also test your backup without exporting any data by using the –no-data option. This will show you all of the information for creating the databases and tables, but it will not export any data. This is also useful for recreating a blank database on the same or on another server.

When you export your data, mysqldump will create INSERT INTO statements to import the data into the tables. However, the default is for the INSERT INTO statements to contain multiple-row INSERT syntax that includes several VALUES lists. This allows for a quicker import of the data. But, if you think that your data might be corrupt, and you want to be able to isolate a given row of data – or if you simply want to have one INSERT INTO statement per row of data, then you can use the –skip-extended-insert option. If you use the –skip-extended-insert option, importing the data will take much longer to complete, and the backup file size will be larger.

Importing and restoring the data is easy. To import the backup file into a new, blank instance of MySQL, you can simply use the mysql command to import the data:

Again, you will need to enter your password or you can include the value after the -p option (less secure).

There are many more options that you can use with a href=”http://dev.mysql.com/doc/refman/5.6/en/mysqldump.html”>mysqldump. The main thing to remember is that you should backup your data on a regular basis, and move a copy of the backup file off the MySQL server.

Finally, here is a Perl script that I use in cron to backup my databases. This script allows you to specify which databases you want to backup via the mysql_bak.config file. This config file is simply a list of the databases that you want to backup, with an option to ignore any databases that are commented out with a #. This isn’t a secure script, as you have to embed the MySQL user password in the script.

 


Tony Darnell is a Principal Sales Consultant for MySQL, a division of Oracle, Inc. MySQL is the world’s most popular open-source database program. Tony may be reached at info [at] ScriptingMySQL.com and on LinkedIn.
Tony is the author of Twenty Forty-Four: The League of Patriots

 

Visit http://2044thebook.com for more information.

InnoDB 전문 검색 : N-gram Parser

기본 InnoDB 전문 검색(Full Text) 파서는 공백이나 단어 분리자가 토큰인 라틴 기반 언어들에서는 이상적이지만 개별 단어에 대한 고정된 구분자가 없는 중국어, 일본어, 한국어(CJK)같은 언어들에서는 각 단어는 여러개의 문자들의 조합으로 이루어집니다. 그래서 이경우엔 단어 토큰들을 처리할 수 있는 다른 방법이 필요합니다.

우리는 CJK에서 사용할 수 있는 n-gram 파서를 제공하기 위한 새로운 플러그블 전문 파서(pluggable full-text parser)를 MySQL 5.7.6 에서 제공할 수 있게되어 정말 기쁩니다.

N-gram이 정확히 뭘까요?

전문 검색에서 n-gram은 주어진 문자열에서 n개 문자의 인접한 순서입니다. 예를들어 n-gram 을 이용해 우리는 “abcd” 문자열을 다음과 같이 토큰나이즈합니다.

InnoDB에서 n-gram 파서를 어떻게 사용할 수 있을까요?

새 n-gram 파서는 기본적으로 로드되고 활성화되어 있기때문에 그것을 사용하기 위해서는 여러분의 대상 DDL문들에 WITH PARSER ngram 문을 간단히 기술하기만 하면 됩니다. 예를들어 MySQL 5.7.6 과 그 이후버전에서는 다음의 문장을 모두 사용할 수 있습니다.

MySQL 5.7.6에서는 ngram_token_size(토큰은 n 문자로 만들어진 단어와 거의 동등함)라는 새로운 글로벌 서버 변수도 도입되었습니다. 기본값은 2(bigram)이고 1에서 10까지 변경 가능합니다. 다음 질문은  토큰사이즈를 어떻게 선택할까? 일 것입니다. 일반적인 경우엔 2 또는 bigram이 CJK에서 권장되어 지지만, 여러분은 아래 간단한 규칙에 따라 유효한 값을 선택할 수 있습니다.

규칙 : 여러분이 검색하려고 하는 가장 큰 토큰으로 토큰 사이즈를 설정한다.

만약 단일 문자를 검색하려면, ngram_token_size을 1로 설정해야합니다.  ngram_token_size가 작은 쪽이 인덱스를 작게 할 수있어 그 인덱스를 이용한 전체 텍스트 검색이 더 빨라집니다. 그러나 단점은 당신이 검색 할 수있는 토큰 크기를 제한하는 것입니다. 예를 들어 영어의 “Happy Birthday”전통적인 중국어로는 ‘生日高興” 라고 번역됩니다. ( ‘Happy’== ‘高興’,’Birthday’=’生日’)이 예와 같이 각 단어/토큰은 2 문자로 구성되기 때문에이 토큰을 검색하기 위해서는ngram_token_size을 2 이상으로 설정해야합니다.

N-gram 토큰화의 상세

n-gram 파서는 기본적으로 전문(full text) 파서와 다음과 같은 차이점이 있습니다.

  1. 토큰 크기 : innodb_ft_min_token_sizeinnodb_ft_max_token_size는 무시됩니다. 대신 토큰을 제어하기 위해 ngram_token_size을 지정합니다.
  2. 스탑워드 처리 : 스탑워드(stopwords) 처리도 조금 다릅니다. 일반적으로 토큰 화된 단어 자체(완전히 일치)가 스탑워드 테이블에 있다면 그 단어는 전문 검색 인덱스에 추가되지 않습니다. 그러나, n-gram 파서의 경우, 토큰화된 단어에 스탑워드가 포함되어 있는지 확인하고 포함된 경우엔 인덱스를 생성하지 않습니다. 이렇게 동작이 다른 이유는 CJK에서는 매우 빈번하게 사용되는 무의미한 문자, 단어, 문장 부호를 가지고 있기 때문입니다. 스탑워드와 일치하는 문자가 포함되어 있는지를 확인하는 방식을 사용하면 쓸모없는 토큰을 더 많이 제거 할 수 있습니다.
  3. 공백 : 공백은 항상 하드 코드된 스탑워드입니다 예를 들면, ‘my sql’는 항상 ‘my’, ‘y’, ‘s’, ‘sq’, ‘ql’로 토큰화되어지고 ‘y’와 ‘s’는 인덱싱되지 않습니다.

우리는 INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE테이블과 INFORMATION_SCHEMA.INNODB_FT_TABLE_TABLE 테이블을 참조하여 특정 전문 검색 인덱스에 어떤 토큰이 인덱스화되어 있는지 정확하게 확인할 수 있습니다. 이것은 디버깅을 위해 매우 유용한 도구입니다. 예를 들어, 단어가 예상대로 전문 검색 결과에 표시되지 않는 경우, 그 단어는 어떤 이유 (스탑워드, 토큰, 크기, 등)로 인덱스화되어 있지 않은지, 이 테이블을 참조하여 확인할 수 있습니다. 간단한 예를 소개합니다.

N-gram 검색 처리의 상세

텍스트 검색

NATURAL LANGUAGE MODE에서 검색되는 텍스트는 n-gram의 합집합으로 변환됩니다. 예를 들어, ‘sql’는 ‘sq ql’로 변환됩니다 (기본 토큰 크기인 2 또는 bigram의 경우).

BOOLEAN MODE 에서 검색되는 텍스트는 n-gram 구문 검색으로 변환됩니다. 예를 들어, ‘sql’은 ‘ “sq ql”‘로 변환됩니다.

번역자 주석  : “sq ql”는 ‘sq’와 ‘ql’가 순서대로 모두 일치해야하므로 ‘sq’나 ‘ql’은 검색 결과에 나오지 않는다.

와일드 카드를 사용한 검색

접두사 (prefix)가 ngram_token_size 보다 작은 경우, 검색 결과는 그 접두사로 시작하는 n-gram 토큰을 포함한 모든 행을 반환합니다.

접두사 길이가 ngram_token_size과 같거나 큰 경우 와일드 카드를 사용한 검색은 구문 검색으로 변환되고 와일드 카드는 무시됩니다. 예를 들어, ‘sq *’는 ‘ “sq”‘로 변환되어 ‘sql *’는 ‘ “sq ql”‘로 변환됩니다.

구문 검색

구문 검색은 n-gram 토큰 문구 검색으로 변환됩니다. 예를 들어, “sql”는 “sq ql”로 변환됩니다.

InnoDB의 전문 검색 전반에 관하여 자세히 알고 싶다면, 사용자 설명서 InnoDB Full-Text Index 부분과 Jimmy의 기사 (Dr. Dobb’s article)를 참조하십시오. N-gram 파서 대한 자세한 내용은 사용자 설명서의 N-gram parser 부분을 참조하십시오.

우리는 이 새로운 기능이 여러분에게 도움이 되기를 바랍니다! 또한 MySQL 5.7을 통해 전문 검색이 CJK에도 적용된 것을 매우 기쁘게 생각합니다. 이 개선은 MySQL 5.7의 개선 사항 중에서도 큰 것입니다. 질문이 있으면, 이 블로그에 댓글을 다시거나 오라클 서포트(support ticket )로 문의 바랍니다. 만약 버그를 발견하면 이 블로그에 댓글을 다시거나 버그 리포트에 게시, 또는 오라클 서포트에 문의 바랍니다.

항상 MySQL을 이용해 주셔서 감사합니다!