When ONLY_FULL_GROUP_BY Won’t See the Query Is Deterministic…

This is a follow-up post to my recent announcement of only_full_group_by improvements in 5.7, where I’d like to address some tricky corner cases where GROUP BY contains non-column expressions (functions).

In the SQL standard, GROUP BY should contain only table columns and never expressions. So this example query where we want to count how many people have the same name is illegal in the SQL2011 standard because the GROUP BY clause contains a function:

But MySQL does allow expressions in a GROUP BY clause, and it’s a very convenient addition to the standard. The logic of the only_full_group_by SQL mode notices that the selected expression is equal to the grouping expression, so it concludes that the result is not random and thus accepts the query.

However, let’s try this:

As you’ll see, it returns:
Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'test.people.first_name' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by"
Indeed, the selected expression is not equal to the grouping one anymore. A human brain instantly recognizes that it’s still deterministic, as it’s simply the UPPER() value of the grouping expression, but for MySQL to recognize it too it should analyze the selected expression’s parse tree, trying to spot the grouping expression at all possible places in that tree. As you can imagine, the selected expression could be rather complex. For example:

MySQL would have to:

  • Start with UPPER:
    • See if the UPPER value is the CONCAT of GROUP BY — no it’s not
    • So go down to arguments of UPPER
    • There it finds CONCAT, which is equal to the GROUP BY one — OK!
  • Now check SUBSTR:
    • It’s not CONCAT
    • So go down to the arguments
    • CONCAT is there in the first argument — fine
    • The second argument is a multiplication, which is not CONCAT
      • So go down to its arguments
      • We have 0.6, which is not CONCAT but rather a constant and deterministic
      • Then LENGTH, which is not CONCAT
      • So let’s check its arguments
        • CONCAT is there, good. Finally done!

All these comparisons of functions have a runtime cost and an implementation cost, so we decided not to go down this path. After all, only_full_group_by had always rejected such a query—which is a non-Standard one—so we chose not to address these specific cases.

Thus, MySQL simply observes that the selected expression is not the grouping one and then it collects columns referenced by the selected expression to see if they’re equal to, or functionally dependent on, some grouping columns. But if there’s no grouping column — bail out!

One could take a step back and realize that grouping on first_name and last_name (the columns) is more natural than grouping on CONCAT, and it also yields a Standard-compliant query:

But what if you do have a query which really must group on an expression (likely something less trivial than my examples)? We saw that UPPER and GROUP BY CONCAT don’t fit together. A first idea is thus to group on a column; for that, the Standard-compliant solution is to use a derived table:

It makes CONCAT’s output a column of the derived table and thus the grouping happens on this column and finally UPPER uses this column value.

However, the derived table will have as many rows as people; as it will be materialized internally it will form a big temporary table, which creates space and speed issues. It would be better if the derived table contained only the groups. So we’re set for grouping on CONCAT, keeping UPPER separated from the grouping query:

Another potential solution, albeit a less safe one, is to add ANY_VALUE around UPPER() (or even turn only_full_group_by off!):

One case I have seen recently, brought up by Roland Bouman, is when one wants to sort the output in a particular order, for example one wants NULLs first, so adds ISNULL in ORDER BY:

This time, the selected expression is equal to that of GROUP BY, so it has no problem; but the ISNULL expression is the new problem! To ensure a deterministic order of results, only_full_group_by has to validate this ISNULL, and, just like for the previous UPPER call, it will not see that the argument of the ISNULL call is the same as the grouping expression. Here, because the problem is in the ORDER BY clause, and because ORDER BY clauses are allowed to reference expressions of the select list by aliases in MySQL, the simplest solution is to simply use an alias:

MySQL will see that ISNULL’s argument is an alias (CONC) and that it matches the same alias in the GROUP BY clause — the alias is then treated as a “grouping pseudo-column”. Matching aliases is much simpler than matching expressions, so we have been able to implement that in the only_full_group_by logic. Thus the above query will pass the only_full_group_by restrictions. It’s also much more readable with aliases than without, in my humble opinion; it might even execute slightly faster, as the value of CONC, computed in GROUP BY, is now reused by ISNULL, leading to less calls to the CONCAT function. The only potential drawback is that using aliases in GROUP BY clauses is not Standard-compliant, so it may cause issues if you need to support various DBMS options with your application.

I hope that this blog post has given you enough safe and usable tips for those types of queries. If you know of other tricks, they are warmly welcome, and you might just drop them in a comment here (thanks in advance!).

As always, THANK YOU for using MySQL!

7 thoughts on “When ONLY_FULL_GROUP_BY Won’t See the Query Is Deterministic…

  1. I think you made a mistake in not recognizing expressions. I acknowledge that it is not standard behavior. But it does not conflict with the standard. And Oracle has done it since at least 1990, so one could argue that it is the de facto standard.

    Your argument “All these comparisons of functions have a runtime cost and an implementation cost” doesn’t hold up. Oracle was able to implement it efficiently on much slower hardware. It is simple structural matching of trees, which is O(n) in the size of the tree if you have a half-decent hash-code.

    Your suggested workaround to ISNULL forces people to reference aliases from HAVING and ORDER BY — which is itself not standard!

    For the record, “GROUP BY first_name, last_name” does not necessarily produce the same number of rows as “GROUP BY CONCAT(first_name, ‘ ‘, last_name)”. So that workaround doesn’t work.

    Also, please implement NULLS FIRST/ NULLS LAST already so we can stop using the ISNULL workaround.

  2. @Julian: Thank you for using MySQL.
    – NULLS FIRST / LAST: indeed a frequently requested one, filed at
    – recognizing that selected expressions are functions of non-column group expressions: indeed MySQL has never supported it, while Oracle does. If MySQL’s behaviour is an issue, turning only_full_group_by off or using any_value() works around it. On the other hand, MySQL now supports other functional dependency detections (pk-based, equality-based, …)
    while Oracle doesn’t.

    1. I’m a fan of MySQL’s functional dependency support. I’d argue that since you go beyond the standard and allow expressions in GROUP BY then it would be nice to deduce functional dependencies on expressions. Suppose you have ‘x + y’ in the GROUP BY clause. ‘ISNULL(x + y)’ is functionally dependent on ‘x + y’ and it is both well-founded and useful to allow it in the SELECT and ORDER BY clauses.

Leave a Reply