* [Update of April 2017: the features discussed here are now also available in the official release 8.0.1 of MySQL.]*

Here is the second in a series of posts about CTEs, a new feature of MySQL 8.0, available in this Labs release. Here is the first post; there is also a third post. The first one ended with:

*Inside the recursive CTE definition (the part in AS (…)), some syntax constraints must be respected […]*

*a recursive SELECT mustn’t contain GROUP BY, aggregate functions*

*(like SUM), ORDER BY, LIMIT, DISTINCT (this rule doesn’t apply to the non-recursive/anchor/seed SELECT)**a recursive SELECT must reference the CTE only once and only in its*

*FROM clause, not in any subquery. Of course, it can additionally reference other tables than the CTE and join them with the CTE, which can be very useful to build hierarchies (for example, if we have a table of bosses and employees and want to answer the question “who reports directly or indirectly to Mrs. X?”). If used in a JOIN like this, the CTE must not be on the right side of a LEFT JOIN.*

So it is time to give some justifications for these constraints, which come from the SQL standard (except the exclusion of ORDER BY, LIMIT and DISTINCT, which is MySQL-specific, though also present in several well-known RDBMSs). Consider this example, producing numbers from 1 to 10:

1 2 3 4 5 6 7 |
WITH RECURSIVE my_cte AS ( SELECT 1 AS n UNION ALL SELECT 1+n FROM my_cte WHERE n<10 # <- recursive SELECT ) SELECT * FROM my_cte; |

The recursive SELECT looks like it’s operating on the **set** of rows of my_cte, as it selects from my_cte. But in fact, it must act as if it operated **on each row**, one by one, in turn, **in isolation** from any other row of my_cte. You have to view the recursive SELECT as a process which, from **one** row of iteration N, without knowledge of other rows of iteration N, produces some new rows for iteration N+1, and repeats the job for another row of iteration N, until it has processed all rows of iteration N one by one, after which it processes one row of iteration N+1, and so on.

With that rule in mind, let’s realize that:

- GROUP BY needs all rows to put them into groups,
- So does an aggregate function like SUM,
- ORDER BY needs all rows to sort them,
- LIMIT needs to eliminate rows based on the count of other rows,
- DISTINCT needs to eliminate rows based on the existence of other rows,
- Referencing my_cte twice in the FROM clause needs a row from my_cte and another row from my_cte, to join them,
- Referencing my_cte in the FROM clause and in a subquery needs a row from my_cte for FROM and other rows of my_cte to evaluate the subquery,
- Referencing my_cte in the right side of a LEFT JOIN needs all rows of my_cte, to know if at least one joins with the left side or if a NULL complement must be made.

Given that the recursive SELECT has access to only one row at a time, it is clear why all cases above cannot be allowed in a recursive SELECT.

This SQL rule has two advantages: first, it gives database systems the freedom to generate rows in any order of steps; second, it forbids quite a few unreasonable queries.

To see how this influences the patterns of queries, let’s now build a recursive query to generate Fibonacci numbers, which are 1, then 1, then 1+1 (=2), then 1+2 (=3), then 2+3 (=5), then 3+5 (=8) – the (N+2)th number is the sum of the (N+1)th and the (N)th.

A first approach would be: put the two first numbers, 1 and 1, in a seed SELECT:

1 2 3 |
SELECT 1 as f # first number; "f" stands for Fibonacci number UNION ALL SELECT 1 # second number |

then generate all other numbers with

1 2 3 |
SELECT cte_n.f + cte_n_plus_1.f FROM my_cte AS cte_n JOIN my_cte AS cte_n_plus_1 ON ... which condition to put here? |

My intention is: grab the (N)th and (N+1)th numbers, which are already in my_cte, and sum them; so: grab a row from my_cte, containing the (N)th number, and a row from my_cte, containing the (N+1)th number, and sum them; to grab the two rows at the same time I use a SQL JOIN, but I wonder how to make sure the JOIN pairs the right numbers. I need to add, as ON condition, something like “the row of cte_n_plus_1 must be the number right after the row of cte_n”. Otherwise I’ll also be summing the 5-th and the 9-th numbers, or whatever! So I need to add ordinal numbers to the table. Ok, so let’s add ordinal numbers to the seed SELECT:

1 2 3 4 |
SELECT 1 as f, 1 as n UNION ALL SELECT 1, 2 # "f" : Fibonacci number; "n": ordinal number, # first row: has value 1 and is 1st number # second row: has value 1 and is 2nd number |

and now we can write this recursive SELECT:

1 2 3 |
SELECT cte_n.f + cte_n_plus_1.f, cte.n + 2 FROM my_cte AS cte_n JOIN my_cte AS cte_n_plus_1 ON cte_n.n + 1 = cte_n_plus_1.n # Pair (N)th with (N+1)th to add them |

Alas, this won’t work, as it’s referencing my_cte twice in the FROM clause, which is forbidden as it violates the rule which says: *“handle one row at a time”*. Thus, I have to change my query so that all necessary information to calculate a new number is found in one row and doesn’t require two rows; for that, I’ll just store the N-th and (N+1)th number in one row:

1 2 3 4 |
SELECT 1 as f, 1 as next_f # "f" : Fibonacci number of this row, # "next_f": next Fibonacci number after "f", # no ordinal number needed anymore |

and use this recursive SELECT:

1 |
SELECT next_f, f+next_f FROM my_cte |

Putting it all together, and adding a condition to stop around 500:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
WITH RECURSIVE my_cte AS ( SELECT 1 as f, 1 as next_f UNION ALL SELECT next_f, f+next_f FROM my_cte WHERE f < 500 ) SELECT * FROM my_cte; +------+--------+ | f | next_f | +------+--------+ | 1 | 1 | | 1 | 2 | | 2 | 3 | | 3 | 5 | | 5 | 8 | | 8 | 13 | | 13 | 21 | | 21 | 34 | | 34 | 55 | | 55 | 89 | | 89 | 144 | | 144 | 233 | | 233 | 377 | | 377 | 610 | | 610 | 987 | +------+--------+ |

That’s it; the **f** column contains the desired Fibonacci sequence.

While we saw above that a recursive CTE can’t be joined with itself, it can perfectly be joined with other tables. Let’s use that to produce all 4-digit bit strings:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
WITH RECURSIVE digits AS ( SELECT '0' AS d UNION ALL SELECT '1' ), strings AS ( SELECT '' AS s UNION ALL SELECT CONCAT(strings.s, digits.d) FROM strings, digits WHERE LENGTH(strings.s) < 4 ) SELECT s FROM strings WHERE LENGTH(s)=4; |

A first, non-recursive CTE, **digits**, holds the two building blocks 0 and 1. Then the recursive CTE, **strings**, starts with an empty string, then concatenates the result with the 0 and 1 of **digits**, and so on, so the expected result is: empty string, 0, 1, 00, 01, 10, 11, 000, etc, up to 1111.

The real result is *ERROR 1406 (22001): Data too long for column ‘s’ at row 2*.

At this point you may want to read towards the end of the previous blog post (search for “longer and longer”) which mentions that the type of column **strings.s** is determined from the non-recursive SELECT only i.e. *SELECT ” AS s* : and the type of the empty string (*”*) is CHAR(0)… Inserting ‘0’ into CHAR(0) would cause a character truncation, so: error!

A detail though: it fails because I’m using strict mode, which is the default starting from MySQL 5.7, and because strict mode hates truncation. My statement is a SELECT, and strict mode usually sends only warnings if a truncation happens during SELECT (whereas it sends errors during INSERT/UPDATE/DELETE), but we think it makes sense that SELECT become as strict as other statements in the future, and WITH RECURSIVE is one such case, so: error!

Back to our problem, I just have to make the column wider with a CAST and now it works:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
WITH RECURSIVE digits AS ( SELECT '0' AS d UNION ALL SELECT '1' ), strings AS ( SELECT CAST('' AS CHAR(4)) AS s UNION ALL SELECT CONCAT(strings.s, digits.d) FROM strings, digits WHERE LENGTH(strings.s) < 4 ) SELECT * FROM strings WHERE LENGTH(s)=4; +------+ | s | +------+ | 0000 | | 1000 | | 0100 | | 1100 | | 0010 | | 1010 | | 0110 | | 1110 | | 0001 | | 1001 | | 0101 | | 1101 | | 0011 | | 1011 | | 0111 | | 1111 | +------+ 16 rows in set (0,00 sec) |

With the examples above, you now have material to “program” many number-generating or string-generating queries. In the next post, we will see how to use recursive queries to explore hierarchical data (X-is-a-child-of-Y).

And, as always:

1 2 3 4 5 6 |
select from_base64('VGhhbmsgeW91IGZvciB0cnlpbmcgcmVjdXJzaXZlIENURXMh') as final_words; +--------------------------------------+ | final_words | +--------------------------------------+ | Thank you for trying recursive CTEs! | +--------------------------------------+ |

In the Fibonacci example I find it interesting to note that the number 610 is generated, although the WHERE clause should stop at f<500. But the WHERE clause refers to the input row. The output row is still generated, but will not serve as input for the next recursion step. A bit confusing at first…

That’s right, Mario.