In MySQL 8.0 we have replaced the old regular expression library with the ICU regex library. See Martin’s blog on the topic. The main goal is to get full Unicode support for regular expressions, but in addition we get a lot of neat features.…
All posts by Bernt Marius Johnsen
MySQL 8.0 Collations: Migrating from older collations, Part 2
In my blog MySQL 8.0 Collations: Migrating from older collations I showed a query that could identify the values that might break a unique constraint when migrate your data. That query was not very efficient due to the self join of the converted values.…
Debugging Character Set Issues by Example
In a world moving towards Unicode and UTF-8, a lot of applications still use some one-byte character set. And since one-byte characters usually accepts any byte in the range 0x00-0xFF it often works well to store and retrieve any data in such character strings, e.g.…
MySQL 8.0 Collations: Migrating from older collations
From MySQL 8.0, utf8mb4 is the default character set, and the default collation for utf8mb4 is utf8mb4_0900_ai_ci. MySQL 8.0 is also coming with a whole new set of Unicode collations for the utf8mb4 character set.
This will allow use of the complete Unicode 9.0.0 character set in MySQL, and for new applications this is great news.…
MySQL 8.0 Collations: The devil is in the details.
One of the challenges of language specific collations, is making sure they are accurate in the edge-cases of sometimes lesser-used language features. Since I am Norwegian, let me use the Danish collation (which is identical to Norwegian collation) as an example:
Most Scandinavian people know that in Danish (and Norwegian), we have three extra letters: ‘Æ’, ‘Ø’ and ‘Å’ and they follow after ‘Z’ in that order.…