Performance Impact of InnoDB Transaction Isolation Modes in MySQL 5.7

During the process of reviewing our server defaults for MySQL 5.7, we thought that it might be better to change the default transaction isolation level from REPEATABLE-READ to READ-COMMITTED (the default for PostgreSQL, Oracle, and SQL Server). After some benchmarking, however, it seems that we should stick with REPEATABLE-READ as the default for now.

It’s very easy to modify the default isolation level, however, and it can even be done at the SESSION level. For the most optimal performance you can change the transaction isolation level dynamically in your SESSION according the situation:

  • For short running queries and transactions, use the default level of REPEATABLE-READ.
  • For long running queries and transactions, use the level of READ-COMMITTED.

You can find the full details of our recent benchmarking efforts on this topic here: http://dimitrik.free.fr/blog/archives/2015/02/mysql-performance-impact-of-innodb-transaction-isolation-modes-in-mysql-57.html

As always, THANK YOU for using MySQL!

MySQL Dumping and Reloading the InnoDB Buffer Pool

MySQL’s default storage engine as of version 5.5 is InnoDB. InnoDB maintains a storage area called the buffer pool for caching data and indexes in memory. By keeping the frequently-accessed data in memory, related searches are retrieved much faster than reading from disk.

When you stop or restart MySQL, you lose the cached data stored in the buffer pool. There is a feature in MySQL 5.6 which allows you to dump the contents of the buffer pool before you shutdown the mysqld process. Then, when you start mysqld again, you can reload the contents of the buffer pool back into memory. You may also dump the buffer pool at any time for reloading later.

To see information about the buffer pool, use the SHOW ENGINE INNODB STATUS command:

This example shows the buffer pool contains 1195 database pages (this example is a very small one from my home server). When you dump the buffer pool to disk, only the database pages are recorded. When you restart mysqld, the data from these pages will be loaded back into memory.

You may dump the buffer pool with this command:

The buffer pool dump file is stored in your MySQL data directory.

The dump is a plain-text file, and we can see the file is 1195 lines long and contains only the database page references.

If you have a large buffer pool, you can check on the status of the dump with this command:

If you want to save the buffer pool when MySQL is shutdown or restarted, use this command:

To restore the buffer pool when starting MySQL, append this statement to your mysqld command:

Or, to load the buffer pool file while mysqld is running, use this command:

Reloading the buffer pool is very fast, and is performed in the background so the users will not be effected. More information about preloading the buffer pools may be found at http://dev.mysql.com/doc/refman/5.6/en/innodb-preload-buffer-pool.html.

 


Tony Darnell is a Principal Sales Consultant for MySQL, a division of Oracle, Inc. MySQL is the world’s most popular open-source database program. Tony may be reached at info [at] ScriptingMySQL.com and on LinkedIn.
Tony is the author of Twenty Forty-Four: The League of Patriots

 

Visit http://2044thebook.com for more information.

Extent Descriptor Page of InnoDB

Within the MySQL data directory, the InnoDB storage engine creates two types of files — the data files and the redo log files. Each data file (or ibd file) belongs to exactly one tablespace. Each tablespace is given a unique identifier called the space_id. One tablespace can have 1 or more data files. If a tablespace has more than one data file, then the data files have a specific order or sequence. The data files can be thought of as being concatenated to each other in that specific order.

The data file is made up of a series of equal sized pages. Each page in the data file is given a unique number identifier called the page number (page_no). The first page of the first ibd file is given the page_no of 0. The page number of the first page of the second ibd file of the tablespace is one more than the page number of the last page of the first ibd file and so on. So the page number is unique within a tablespace even if it has multiple data files. Given the combination of “space_id, page_no”, a single page in any data file can be uniquely identified within the given data directory.

The data files contain user data and system data. The user data is what the database applications and users will store in the database. The system data is what InnoDB creates and maintains in order to implement its storage engine features (for example: the page headers, tablespace headers, segment information, and undo logs). Each individual page is also categorized into a given page type based on how that particular page is used. One such page type is an extent descriptor (xdes) page. In this post we’ll examine the details of an extent descriptor page; we’ll look at both the its structure and its contents.

An Extent

An InnoDB data file or ibd file is a sequence of equal sized pages (supported page sizes). These pages are grouped into extents and segments. One extent is a contiguous sequence of pages in an ibd file. The number of pages belonging to an extent depends on the page size. The following table provides the relationship between page size and extent size (in pages and megabytes). The extent size is accessed by using the macro FSP_EXTENT_SIZE in the source code.

Page SizeExtent Size in PagesExtent Size in MB
4KB2561MB
8KB1281MB
16KB641MB
32KB642MB
64KB644MB

An extent is the basic unit of file space allocation in InnoDB. A segment is made up of a sequence of extents. A segment can contain non-contiguous extents. Refer to Data Organization in InnoDB for more details.

An Extent Descriptor

An extent descriptor (xdes) provides information about an extent.  It contains the following pieces of information:

  • The identifier of the segment to which the extent belongs.
  • The location of the previous extent descriptor within the same segment.
  • The location of the next extent descriptor within the same segment.
  • The state of the extent.
  • The allocation bitmap of the pages within the extent. The size of this bitmap depends on the number of pages in the extent. For each page, 2 bits (XDES_BITS_PER_PAGE) are used. So if the extent has 64 pages, then the size of this allocation bitmap per extent is 16 bytes.

For each page, the allocation bitmap has 2 bits — XDES_FREE_BIT and XDES_CLEAN_BIT. The XDES_CLEAN_BIT is currently unused. The XDES_FREE_BIT tells you whether the page is free or it’s currently in use. So the extent descriptor is what tells us whether each page is free or used within that extent. Thus the extent descriptor helps to locate free pages within an extent.

The extent descriptor also tells us the state of an extent. The extent can be in any of the following states:

  • XDES_FREE — The extent is in the free list and the space can be used.
  • XDES_FSEG — The extent belongs to a segment.
  • XDES_FREE_FRAG — The extent belongs to the free fragment list.
  • XDES_FULL_FRAG — The extent belongs to the full fragment list.

When an extent is yet to be used by any segments, it will be in the XDES_FREE state, meaning that it is “free space”. When the whole extent is allocated to a segment, then it will go into the XDES_FSEG state. There are situations in which InnoDB will decide that individual pages within a particular extent can be allocated to different segments. Such an extent will then be put in the free fragment list using the XDES_FREE_FRAG state. When a fragmented extent is completely used (no free pages), then it will go into the XDES_FULL_FRAG state.

The Structure of an Extent Descriptor

The following C++ code snippet is provided as a means to inspect the structure of an extent descriptor (xdes). All other structures needed to define an xdes is also provided in this code snippet. Note the use of the “packed” attribute for the structures. These are necessary to avoid unnecessary padding. (This also means that you should be aware of data structure alignment issues.) Also, note that the data stored in the file will be in network byte order, thus any necessary byte order conversions must be done while reading and writing to the file. The complete compilable sample program is given in the appendix. Here’s a snippet:

An Extent Descriptor Page

Now let’s discuss the contents of an extent descriptor page. An extent descriptor page contains primarily an array of extent descriptors. The primary purpose of this page type is to maintain the allocation information of the pages within the extents that they describe. Each page of the tablespace whose page number is a multiple of the page size is an extent descriptor (xdes) page. If the page size is 4K, then the page numbers 0, 4096, 8192, 12288 and so on are the xdes pages. The first extent descriptor page is referred to by using the macro FSP_XDES_OFFSET within the InnoDB source code. The page type of an extent descriptor page is denoted with the FIL_PAGE_TYPE_XDES type. One xdes page will contain an array of extent descriptors, and each extent descriptor will provide allocation information of one extent. The extent descriptor page contains the following items:

  • The file page header (38 bytes).
  • The file space header (112 bytes)
  • An array of extent descriptors (variable, depending on the page size).
  • The file page trailer (8 bytes).

The following table provides information about the number of extent descriptors that will be stored in an extent descriptor page, for each given page size.

Page Size (in KB)Extent Size in PagesExtent Descriptor Page NumbersAllocation Bitmap Size in One Extent DescriptorSize of 1 Extent DescriptorNumber of Extent Descriptors stored in an xdes page
4KB2560, 4096, 8192, 12288, ...64 bytes88 bytes16
8KB1280, 8192,16384, 24576, ...32 bytes56 bytes64
16KB (default)640, 16384, 32768, 49152, ...16 bytes40 bytes256
32KB640, 32768, 65536, ...16 bytes40 bytes512
64KB640, 65536, 131072, ...16 bytes40 bytes1024

Accessing an Extent Descriptor

With the given information, one can now print the extent descriptors stored in the xdes pages of an InnoDB data file. Here is a short program (mainly for demonstration purposes) to print the state of the first extent descriptor in the first xdes page of an InnoDB data file (ibd file). An important point to remember is that data stored on the disk will be in network byte order, so while reading data into memory, appropriate byte order conversion must be done. This program takes an ibd file as an argument (if it’s the system tablespace, then provide the first ibd file of the system tablespace). And yes, I am not checking the value of argc! Again, the complete sample program is provided in the appendix. Here’s a snippet:

In the above sample code, the first xdes page (with page number 0) is being read into memory. The two headers (the page header and the space header) are being skipped and the first extent descriptor is read. The byte order conversion is then performed and the state of the extent descriptor is printed.

I will leave it as an exercise for the reader to print the complete extent descriptor. :)

Conclusion

This article provided information about the extent descriptor (xdes) pages in InnoDB. The list of free and used pages within data (ibd file) files can be identified by making use of the allocation bitmap within an xdes page.

I hope that this has been interesting and proves helpful! Please let me know if you have any questions or comments. As always, THANK YOU for using MySQL!


Appendix: The Complete Sample Program

For the convenience of the reader, I am providing the complete compilable code sample by combining the code snippets above.  I have hard coded the page size to the default 16K bytes.  I have tested it on Ubuntu 14.10 (Utopic Unicorn).

Disclaimer: The code snippet and sample program in this article is provided for educational purposes only. While I tested the code on my laptop, it is not guaranteed to work on all platforms. While the above sample program uses GCC‘s  __attribute__((__packed__)) , InnoDB does not use them. So using the packed attribute could potentially introduce data alignment issues, thus making the code somewhat platform dependent.

How to Use SSL and MySQL Client Library in the Same Binary!

We plan to hide (not export) the symbols of the SSL library used by the MySQL client library. As step one in that effort, we plan to hide all of the non-documented symbols because we want to stop polluting the MySQL client program’s namespace.

Consider a requirement where there the OpenSSL and MySQL client library functions are both used directly to generate a binary. If the order of linking used is 1. MySQL client library (i.e libmysqlclient) and then 2. OpenSSL library (i.e libcrypto), then using the created binary will result in a crash.

The Reason why it will crash is as follows: The libmysqlclient library already has the built-in yaSSL library and has exported its symbols. The yaSSL and OpenSSL libraries share some of the same symbol names, so the executable which is prepared in the above said order, will resolve the OpenSSL symbols indirectly using the libmysqlclient library (yaSSL) rather than using the OpenSSL library directly. Hence, the binary will try to call functions defined in the libmysqlclient library instead of the desired OpenSSL function, thus resulting in a crash.

We plan on solving this and similar issues as follows: Hide (don’t export) all of the bundled SSL library symbols from libmysqlclient. We will also hide all of the undocumented MySQL symbols in libmysqlclient as well. The final list of symbols that will be exported can be found here.

That’s all for now. I really hope that this new feature makes users’ lives easier and improves their overall experience with MySQL! We also look forward to your feedback on this new feature! You can leave a comment here on the blog post or in a support ticket. If you feel that you encountered any related bugs, please do let us know via a bug report.

As always, THANK YOU for using MySQL!

Relay Log Recovery when SQL Thread’s Position is Unavailable

This blog explains how relay log recovery happens in a scenario where an applier thread (SQL_Thread) is starting for the first time and its starting position is not available for relay log recovery operations. If you are using GTIDs with MASTER_AUTO_POSITION then the following is more or less irrelevant since we then employ a more resilient repositioning scheme. The potential issue described here will also not occur if you have employed crash-safe replication settings, including --sync_master_info=1. With those disclaimers out of the way, let’s proceed.

Background

A crash-safe slave in MySQL 5.6 guarantees that replication progress information is always in sync with what has actually been applied to the local database instance. But there can still potentially be a few cases where the synchronization is not 100% guaranteed, for example in the case of DDL statements (which are not yet transactional).

In order to achieve crash-safe replication you must do the following:

Crash safe replication ensures that the applier thread saves its relevant state information at transaction commit time. During recovery—when relay-log-recovery=1—any existing relay logs are discarded and new ones are created. The receiver thread (IO_Thread) then starts fetching events from the master’s binary log that the applier thread(s) (SQL_Thread) had previously applied on the slave. In other words, the receiver thread’s Read_Master_Log_Pos is set to applier thread’s Exec_Master_Log_Pos. (You can learn more about what those positions mean here). So in practice the receiver thread copies the applied position from the applier thread and starts pulling the master binary log events from that point forward. The applier thread’s Relay_Log_File and Relay_Log_Pos values are then set to new relay log file and position.

In MySQL 5.6.22, we made this procedure even smarter. In the case that an applier thread does not report any position at all (e.g., it had not been started yet when the crash happened), then the slave still tries to mine the relay logs for the first rotate event from the master and then sets the slave’s file and position info based on that event instead.

Problem

The relay log recovery process explained above is entirely dependent on an applier thread’s position. There could, however, be a scenario where an applier thread was never started whereas the receiver thread was. In that case a receiver thread may have download events from the master’s binary log and appended them to slave’s relay log(s). If a crash happens in this scenario and the server is restarted, then even when crash safe settings are enabled the automatic recovery could still fail. This is because the recovery process expects Relay_Master_Log_File to be set in order to start the recovery process. If that value is not set, then the recovery process will finish without doing any recovery work.
img1
Crash safe replication ensures that the applier thread’s state is saved at transaction commit time, whereas the same is not guaranteed for the receiver thread’s state. That means that the receiver thread’s position info is not updated with each event as they are pulled from the master. Hence the slave_master_info table could potentially be out-of-date. This could then cause the events to be fetched once again and appended (for a second time) to the slave’s relay log(s). When the applier thread is then started post-recovery this could result in duplicate key errors on the slave. This issue was reported as Bug#73039 (fixed in MySQL 5.6.22).
Continue reading

When ONLY_FULL_GROUP_BY Won’t See the Query Is Deterministic…

This is a follow-up post to my recent announcement of only_full_group_by improvements in 5.7, where I’d like to address some tricky corner cases where GROUP BY contains non-column expressions (functions).

In the SQL standard, GROUP BY should contain only table columns and never expressions. So this example query where we want to count how many people have the same name is illegal in the SQL2011 standard because the GROUP BY clause contains a function:

But MySQL does allow expressions in a GROUP BY clause, and it’s a very convenient addition to the standard. The logic of the only_full_group_by SQL mode notices that the selected expression is equal to the grouping expression, so it concludes that the result is not random and thus accepts the query.

However, let’s try this:

As you’ll see, it returns:
Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'test.people.first_name' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by"
Indeed, the selected expression is not equal to the grouping one anymore. A human brain instantly recognizes that it’s still deterministic, as it’s simply the UPPER() value of the grouping expression, but for MySQL to recognize it too it should analyze the selected expression’s parse tree, trying to spot the grouping expression at all possible places in that tree. As you can imagine, the selected expression could be rather complex. For example:

MySQL would have to:

  • Start with UPPER:
    • See if the UPPER value is the CONCAT of GROUP BY — no it’s not
    • So go down to arguments of UPPER
    • There it finds CONCAT, which is equal to the GROUP BY one — OK!
  • Now check SUBSTR:
    • It’s not CONCAT
    • So go down to the arguments
    • CONCAT is there in the first argument — fine
    • The second argument is a multiplication, which is not CONCAT
      • So go down to its arguments
      • We have 0.6, which is not CONCAT but rather a constant and deterministic
      • Then LENGTH, which is not CONCAT
      • So let’s check its arguments
        • CONCAT is there, good. Finally done!

All these comparisons of functions have a runtime cost and an implementation cost, so we decided not to go down this path. After all, only_full_group_by had always rejected such a query—which is a non-Standard one—so we chose not to address these specific cases.

Thus, MySQL simply observes that the selected expression is not the grouping one and then it collects columns referenced by the selected expression to see if they’re equal to, or functionally dependent on, some grouping columns. But if there’s no grouping column — bail out!

One could take a step back and realize that grouping on first_name and last_name (the columns) is more natural than grouping on CONCAT, and it also yields a Standard-compliant query:

But what if you do have a query which really must group on an expression (likely something less trivial than my examples)? We saw that UPPER and GROUP BY CONCAT don’t fit together. A first idea is thus to group on a column; for that, the Standard-compliant solution is to use a derived table:

It makes CONCAT’s output a column of the derived table and thus the grouping happens on this column and finally UPPER uses this column value.

However, the derived table will have as many rows as people; as it will be materialized internally it will form a big temporary table, which creates space and speed issues. It would be better if the derived table contained only the groups. So we’re set for grouping on CONCAT, keeping UPPER separated from the grouping query:

Another potential solution, albeit a less safe one, is to add ANY_VALUE around UPPER() (or even turn only_full_group_by off!):

One case I have seen recently, brought up by Roland Bouman, is when one wants to sort the output in a particular order, for example one wants NULLs first, so adds ISNULL in ORDER BY:

This time, the selected expression is equal to that of GROUP BY, so it has no problem; but the ISNULL expression is the new problem! To ensure a deterministic order of results, only_full_group_by has to validate this ISNULL, and, just like for the previous UPPER call, it will not see that the argument of the ISNULL call is the same as the grouping expression. Here, because the problem is in the ORDER BY clause, and because ORDER BY clauses are allowed to reference expressions of the select list by aliases in MySQL, the simplest solution is to simply use an alias:

MySQL will see that ISNULL’s argument is an alias (CONC) and that it matches the same alias in the GROUP BY clause — the alias is then treated as a “grouping pseudo-column”. Matching aliases is much simpler than matching expressions, so we have been able to implement that in the only_full_group_by logic. Thus the above query will pass the only_full_group_by restrictions. It’s also much more readable with aliases than without, in my humble opinion; it might even execute slightly faster, as the value of CONC, computed in GROUP BY, is now reused by ISNULL, leading to less calls to the CONCAT function. The only potential drawback is that using aliases in GROUP BY clauses is not Standard-compliant, so it may cause issues if you need to support various DBMS options with your application.

I hope that this blog post has given you enough safe and usable tips for those types of queries. If you know of other tricks, they are warmly welcome, and you might just drop them in a comment here (thanks in advance!).

As always, THANK YOU for using MySQL!

December London MySQL Meetup

bigbenThe London MySQL Meetup Group hosted a new evening of MySQL related talks on December 11, this time in Canonical’s offices in the Blue Fin Building. Conveniently, it was timed so that it coincided with the Debian/Ubuntu packaging sprint, which I also attended.

We started with pizza, beer and small talk. Since this was my first meetup in London, there were lots of new and interesting people to meet and talk to.

The topic of the evening was “MySQL distributions: in the cloud and on bare metal”, but before we got that far, James Page and Robie Basak presented the results so far of our ongoing packaging sprint. I’ve covered that in another blog post, and Akhil discusses it here as well.

The first on-topic talk was Ivan Zoratti’s talk on databases in the cloud in general, and on Amazon Aurora in particular. We currently don’t know very much about the inner workings of Aurora, except that it must be running MySQL 5.6 with some modifications on the lower layers. So it’s mostly speculations, but it was fun to discuss and talk about what they might have done to achieve what they have. Let’s call it well educated guesses.

George Lorch from Percona presented on OpenStack and Trove. For us that are not involved with OpenStack in our daily work, it was a nice introduction to what Trove is and isn’t.

Finally, I was also asked to give a short talk on new features in the MySQL 5.7.5 DMR. It was just a quick review of some of the features, based on Geir’s blog post on the same topic. I noted that there was some interest in what we’ve done to make InnoDB temporary tables faster.

I usually attend MySQL meetups in Trondheim, Norway, so it was fun to see how they do things in London. I guess there aren’t that many ways to do it when the biggest difference I could find was that the pizza is served before the talks in London while we do it afterwards in Trondheim. I just wish we had a pub nearby in Trondheim so that we could copy the tradition of going for a pint afterwards.

A big thank you to everyone that came, and to the organizers for inviting me!

New Option to Stop the Server If Binlogging Fails in MySQL 5.6

In this post I will try to explain the new MySQL binlog_error_action server option. This new option is available from MySQL 5.6.22 and later.

Background:
——————–
As part of MySQL replication all data changes that happen on the master server are recorded into a binary log so that they can be sent to slave and replayed there. If an error occurs that prevents mysqld from writing to the binary log (disk full, readonly file system, etc.) then the logs are simply disabled and operations continue on the master. This error mainly occurs during rotation of the binary log or while opening a binary log file.

This problem creates a serious potential for data loss within a replication group. When a master hits this failure in production, all downstream replication clients stop receiving replication events. The master will not store binlog events for committed transactions to its binary log and consequently there will be no new events in the binary log to send to replication clients. If that master then fails, then all the transactions the master received while binlogging was turned off are lost forever. This can lead to out of sync slaves and improper backups.

Error message when file system becomes readonly:
binlogging_impossible_error1As part of the bug fix for Bug#51014 the binlog_error_action server option was introduced. Using this option a user can choose to either ignore the error (still the default so as to avoid behavior changes in GA releases) or to abort the server. The IGNORE_ERROR option value refers to the default behavior (as described above) where binlogging will simply be disabled and the master will continue with its normal operations.

However, the ABORT_SERVER option value will cause the server to exit when binlogging operations fail. At the time of the resulting server exit a fatal error is pushed to clients and the server will shut down. The error details being:

Specifying the option:
———————————–
This option can be specified as a startup option (either on the command-line or in a config file) or dynamically in the running server using the SET command:
mysql> SET GLOBAL binlog_error_action=ABORT_SERVER;

Demonstration of the new option in the case of a read-only file system:
——————————————————————————————————————
Step 1: SET GLOBAL binlog_error_action= ABORT_SERVER;
Step 2: Make your file system readonly
Step 3: flush logs
abort_server_img

Summary:
——————
If an error occurs that prevents mysqld from writing to the binary log the existing behaviour is: binary logging is disabled and the server continues with its normal operations. Now with the new binlog_error_action server option the user can choose to either ignore the error (IGNORE_ERROR) or to abort the server (ABORT_SERVER) when binary logging failures occur. This optional behavior was first introduced in MySQL 5.6.20 using the binlogging_impossible_mode server option. That option name is now deprecated in MySQL 5.6.22 and the option is instead now referred to as binlog_error_action.

We look forward to your feedback on this new feature! If you have any questions or encounter any bugs, please do let us know by opening a support ticket or filing a bug. As always, THANK YOU for using MySQL!

MySQL Debian/Ubuntu packaging sprint

Debian/Ubuntu packaging sprint participants. From left: James Page, Norvald H. Ryeng, George Lorch, Akhil Mohan, Otto Kekäläinen, Robie Basak.
Debian/Ubuntu packaging sprint participants. From left: James Page, Norvald H. Ryeng, George Lorch, Akhil Mohan, Otto Kekäläinen, Robie Basak.

Last week, Canonical invited the MySQL packaging team in Debian to a packaging sprint in their London office, and most of us were able to participate. We’ve met online on IRC and UOSs before, but this was the first time we were all in the same room.

The results of our sprint will soon be available in a .deb near you. Since Debian Jessie is currently in feature freeze, most of it will hit Ubuntu first. The two main things we achieved on the MySQL side were to make MySQL 5.6 ready for Ubuntu Vivid (15.04) and to split MySQL, Percona and MariaDB configuration files. The configuration file split reduces coupling between MySQL and variant packages, and it is key to solving most of the challenges the team faces. We can now do packaging work on one variant without coordinating with all the others. That is a major achievement!

In addition to this, we also made smaller improvements: parallel builds and MTR test runs, defined and implemented a test plan in order to run the right tests at the right time (build vs. DEP8), and cleanup of the debian/rules file. Not all of this is visible to the end user, but to the package maintainers it’s a big deal.

Time flies when you’re having fun, so when the week was over, we still had more we’d like to do with the packages. The work will continue, but at least most of the heavy lifting is now done. I don’t think we ever would have got this far without meeting in person.

Let me end by extending my thanks to Canonical for organizing and hosting this sprint! It was really useful and fun! I think we all achieved and learned a lot. So once again, to Canonical and everybody that participated, thank you, and merry Christmas!

Some Notes on Index Statistics in InnoDB

In MySQL 5.6 we introduced a huge improvement in the way that index and table statistics are gathered by InnoDB and subsequently used by the Optimizer during query optimization: Persistent Statistics. Some aspects of the way that Persistent Statistics work could be improved further though, and we’d really like your input on that.

How much to sample?

The statistics are gathered by picking some pages semi-randomly, analyzing them, and deriving some conclusions about the entire table and/or index from those analyzed pages. The number of pages sampled can be specified on a per-table basis with the STATS_SAMPLE_PAGES clause. For example: ALTER TABLE t STATS_SAMPLE_PAGES=500;
This way you can request that more pages be sampled for larger tables in order to make the estimates more accurate. The more pages that are sampled, the better the estimates will be and thus the better the chance that the Optimizer will be able to choose optimal query plans, with the only downside being that the index statistics gathering operation (the sampling) itself becomes slower.

Being able to specify the number of pages to sample on a per table basis is flexible, but what if one can specify the value as a percentage of the table size, rather than an absolute number? Would that make sense? Setting such a global value would not meet every need and use case, because the given percentage may not be sufficient to get a good estimate on a small table and the same percentage may make the sampling process too slow for a very large table. Setting the percentage on a per table basis may also have little advantage over the existent functionality as one already knows if the table is huge or tiny and can set the appropriate value as an absolute number of pages to sample. What do you think? We would love to get your input on this! Do you see a lot of value in being able to set the number of pages to sample as a percentage of the table size or have it depend on the table size in some other way? Cast your vote:

Make it possible to set the amount of pages to ANALYZE as a % of the table size?

View Results

Loading ... Loading ...

How often to sample?

Currently, if 10% of the table has changed (rows updated, deleted, or inserted) since the previous statistics gathering operation then InnoDB will automatically trigger a recalculation of the statistics (it will analyze some leaf pages as discussed in the previous paragraph, reading them from disk if they are not already cached).

In MySQL 5.5 and older, where the auto recalculation threshold was set to 6.25%, it was a problem that the statistics were updated too often which caused too many execution plan changes. In general, we’ve found that 10% is a good overall threshold for this and we haven’t seen a great need to make this dynamic. But do you agree? Do you want to be able to change that value globally or even on a per table basis? Would you increment it or decrement it if you could? Cast your vote:

Make it possible to change the 10% threshold that triggers an automatic ANALYZE?

View Results

Loading ... Loading ...

We look forward to your feedback! And as always, THANK YOU for using MySQL!