You can specify encodings on a per-column basis, at least with ISAM tables, so y...

lilyball · on June 9, 2017

That's remarkably bizarre. Who implements "UTF-8" but restricts it to only the BMP?

morgo · on June 9, 2017

It is historical. See:

http://mysqlserverteam.com/mysql-8-0-when-to-use-utf8mb3-ove...

http://mysqlserverteam.com/sushi-beer-an-introduction-of-utf... (Problem #1)

(Author is the first link here.)

flatline · on June 9, 2017

Remember this is for in-table storage, so it makes a certain amount of sense - this saves a byte over UTF-16 with support beyond the BMP. You have a hard limit on the byte size of the table - how do you determine a priori how much storage a 20 character UTF8 field will consume? The alternatives are to store the value in a clob field or set a hard byte count on the field and let the application or user be surprised when 20 print characters are rejected. I actually don't know how other providers handle in-table Unicode fields, MySQL made some poor choices on naming things at the least.

lilyball · on June 9, 2017

At a bare minimum you'd expect it to be something like utf-8bmp for 3-byte storage and utf-8 for 4-byte.

ars · on June 9, 2017

It's historical. When it was implemented 4 byte unicode did not exist.

Now that it does they can not change the names anymore.

It's the same as why windows is stuck with utf-16, because when they implemented it unicode was 2 bytes.

lilyball · on June 9, 2017

> When it was implemented 4 byte unicode did not exist.

Incorrect. When UTF-8 was invented, it was actually variable up to 6 bytes in length, being capable of representing code points up to U+7FFFFFFF. It was only shortened to 4 bytes in 2003. There is no point in history where UTF-8 was only limited to 3 bytes.

Dylan16807 · on June 10, 2017

That doesn't seem right.

The 1998 version of MySQL didn't support unicode at all yet.

Unicode 2.0 introduced UTF-16 in 1996, making the need for non-BMP characters very explicit.

And UTF-8 at the time supported 31-bit code points.