Removal of implicit and explicit sorting for GROUP BY

In MySQL, historically GROUP BY was used to provide sorting as well. If a query specified GROUP BY, the result was sorted as if ORDER BY was present in the query.

MySQL here implicitly sorts the results from GROUP BY (i.e. in the absence of ASC or DESC designators for GROUP BY columns ).

MySQL also supported explicit sorting with GROUP BY (i.e. by using explicit ASC or DESC designators for GROUP BY columns).

This has changed in 8.0 as it no longer supports either implicit or explicit sorting for GROUP BY. In this blog post, I will explain why this change became necessary and also the work done as a precursor to this change.

GROUP BY in MySQL

To group a set of rows, MySQL optimizer chooses different methods. One of them is to sort the rows before grouping them. This makes it easy to group one group after another. It also becomes inexpensive if there is an index that could be used to get sorted rows. If there is no index, MySQL optimizer could still decide to do external (filesort) sorting before grouping.

As seen in the example, before adding index to the table, MySQL uses external sorting to do GROUP BY. For the example query I have forced the plan by using SQL_BIG_RESULT (as MySQL will not choose this plan for the dataset we have). But MySQL would use this plan to group in the absence of an index to get sorted rows and using temporary table becomes costly because of large number of groups. Once the index is added, it resorts to using the index to do GROUP BY.

But having sorted rows before grouping is not a necessity.  Optimizer can decide to make use of a temporary table to do it. Each row in this table would be a grouped row and with each incoming row, the row corresponding to that group in the table is updated. Sorting is not needed here. However,  as GROUP BY in MySQL was expected to sort,  it was forced to sort the grouped rows even in this case .

In the example query, we can see that although temporary table is used, MySQL still does external sorting. Users would have to explicitly specify ORDER BY NULL to let MYSQL know that GROUP BY need not have to sort.  So a non-standard (ORDER BY NULL) syntax was needed to counteract the effect of another non-standard extension (GROUP BY sorting). It’s much cleaner now that we have eliminated that messiness.

Removal of implicit sorting for GROUP BY

Some time back I was trying to fix bug 71804. The reporter expected MySQL to not do the unnecessary file-sort it was doing for GROUP BY. Trying to make a patch for the bug made us realize that optimizing this particular situation is not very straight forward because of the support for implicit and explicit sorting that GROUP BY provided. So we concluded that before this optimization could be done, we should be re-factoring code related to sorting for GROUP BY.

The first step in doing that was to remove the implicit sorting for GROUP BY. As mentioned in the user manual here, it was decided to remove the support for it some time back . It has been done as part of the descending index feature in 8.0.

As seen in the example above, sorting is not performed for the query. As a result, grouped rows are not sorted in the final result. If users need sorted rows, they must specify ORDER BY in the query.

In MySQL 5.7 and below versions, users find the following warning in the manual.

GROUP BY implicitly sorts by default (that is, in the absence of ASC or DESC designators for GROUP BY columns). However, relying on implicit GROUP BY sorting (that is, sorting in the absence of ASC or DESC designators) or explicit sorting for GROUP BY (that is, by using explicit ASC or DESC designators for GROUP BY columns) is deprecated. To produce a given sort order, provide an ORDER BY clause. “

Removal of explicit sorting for GROUP BY

When it came to removal of explicit sorting, it was a bit more tricky to do it. We could not remove it unless MySQL supported ORDER BY with ROLLUP. ROLLUP with ORDER BY was not allowed in MySQL 5.7 and earlier versions. So, as an alternative, users would use GROUP BY ASC/DESC to get sorted data with ROLLUP (Although the sorting was very restrictive with super aggregate rows always placed after the rows used to compute them in case of ASC and vice versa for DESC). We had to lift this limitation before we removed the support for explicit sorting for GROUP BY.

MySQL now allows ORDER BY with ROLLUP. I have explained in detail on how to make use of this improvement here. As explained in the same blog, if users want the exact same sorting order of NULLs as that of MySQL 5.7 for ROLLUP, they should use the GROUPING() function to re-write the query in a simple way.

So in short we have done the following things as precursors to removing explicit sorting for GROUP BY.

1. Adding GROUPING() function

2. Removal of implicit sorting for GROUP BY

3. Allowing ORDER BY with ROLLUP

And finally we have removed explicit sorting for GROUP BY in MySQL 8.0.13.

We did ask for the communities opinion some time back. We concluded that users who were aware of this non-standard extension that MySQL provided were fine with it going away.

Conclusion

Although we still have some more work to do before we fix bug 71804, we are happy that we got this done. Please let us know your thoughts. Thanks for using MySQL!

About Chaithra Gopalareddy

Chaithra Gopalaredy has been working with MySQL database team since 2011. She is currently employed by Oracle, based in India, Bangalore. She is Principal Software Developer for the Optimizer Team of the MySQL database. She has a background in the database industry, working for Alcatel Lucent's DataBlitz main memory database in past.

Leave a Reply

Your email address will not be published. Required fields are marked *