MySQL/MariaDB unable to handle unique keys with when using utf8mb4

Mugoma J. Okomba asked
mysql utf-8 character-encoding mariadb
via

We have MySQL table with utf8mb4 strings:

CREATE TABLE `test` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(191) COLLATE utf8mb4_unicode_ci NOT NULL,
`code` varchar(191) COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `test_code_unique` (`code`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

When inserting special characters there appears to be wrong conversion:

mysql> insert into `test` (`code`, `name`) values ('munster', 'Munster');


mysql> insert into `test` (`code`, `name`) values ('münster', 'Münster');
ERROR 1062 (23000): Duplicate entry 'münster' for key 'test_code_unique'


mysql> SELECT * FROM test WHERE code='münster';
+----+---------+---------+
| id | name    | code    |
+----+---------+---------+
|  1 | Munster | munster |
+----+---------+---------+
1 row in set (0.00 sec)



mysql> SELECT * FROM test WHERE code='munster';
+----+---------+---------+
| id | name    | code    |
+----+---------+---------+
|  1 | Munster | munster |
+----+---------+---------+
1 row in set (0.00 sec)

If unique key is removed second insert works but a search returns 2 rows even if query is different:

mysql> drop table test;


CREATE TABLE `test` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(191) COLLATE utf8mb4_unicode_ci NOT NULL,
`code` varchar(191) COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;


mysql> insert into `test` (`code`, `name`) values ('munster', 'Munster');


mysql> insert into `test` (`code`, `name`) values ('münster', 'Münster');


mysql> SELECT * FROM test WHERE code='münster';
+----+----------+----------+
| id | name     | code     |
+----+----------+----------+
|  1 | Munster  | munster  |
|  2 | Münster  | münster  |
+----+----------+----------+
2 rows in set (0.00 sec)


mysql> SELECT * FROM test WHERE code='munster';
+----+----------+----------+
| id | name     | code     |
+----+----------+----------+
|  1 | Munster  | munster  |
|  2 | Münster  | münster  |
+----+----------+----------+
2 rows in set (0.00 sec)

This has been tested on both MySQL 5.7 and MariaDB 10.2 and they both give same results.

What could be going wrong?


Answer
via

The reason for this seemingly mysterious problem is that you’re using utf8mb4_unicode_ci collation, and that collation intentionally ignores differences in accented characters vs non-accented characters. See: https://dev.mysql.com/doc/refman/5.7/en/charset-general.html

To resolve this, change collation on code column to utf8mb4_bin, which will distinguish between accented characters and non-accented characters, and also between caSe.

Share This
Posted in: