MySQL 8.0 Collations: Migrating from older collations, Part 2

In my blog MySQL 8.0 Collations: Migrating from older collations I showed a query that could identify the values that might break a unique constraint when migrate your data. That query was not very efficient due to the self join of the converted values.

Now that MySQL 8.0 RC1 (MySQL 8.0.3) is out we may use Window Functions to identify those columns. The new query is simpler, more elegant and will be much more efficient than the one in the original blog, and it lists all values that have some other value that breaks the unique constraint after a conversion to utf8mb4:

How to write queries like this is described in Dag’s blog on Window Functions.

Using the example from the original blog, we get the this result:

A  synthetic test with nearly 40k strings in a latin1 column with default collation and around 750 possible offending values in utf8mb4 with default collation, run on my desktop computer, show that the original query with the self join used nearly 5 seconds, while the window function version shown here used 0.10 seconds, 50 times faster.

Note that the two queries do not generate the same result, they just solve the same problem.

Thank you for using MySQL !

About Bernt Marius Johnsen

Bernt has been working with software for more than three decades. As a developer he has worked with expert systems, compilers, distributed systems, banking software and several databases. He has been a Senior member of MySQL's Server QA team since 2010.

3 thoughts on “MySQL 8.0 Collations: Migrating from older collations, Part 2

  1. A minor detail. You name your MySQL 8.0 versions as 8.0.X. So when looking for ” MySQL 8.0 RC1″ it’s not clear which version you’re referring to. I know the previous versions are DMR (Development Milestone releases) and the RC label you’ve attached to this version indicates you think this is “almost ready for production”, hence a release candidate.

    Please when mentioning versions, refer to the version the server reports, or the packaging uses to avoid confusion. This version you mention is 8.0.3 as referenced here: I know this but the post does not make this clear.

Leave a Reply