Looking at the basic multilingual plane [1], UTF-8 will use > 2 bytes to encode ...

jeltz · on April 14, 2020

> All of Southeast Asia

Did you forget Indonesia, Vietnam, Malaysia, Brunei and the Philippines?

camgunz · on April 14, 2020

Again, here's what UTF-8 will use <= 2 bytes for:

Basic Latin (Lower half of ISO/IEC 8859-1: ISO/IEC 646:1991-IRV aka ASCII) (0000–007F)

Latin-1 Supplement (Upper half of ISO/IEC 8859-1) (0080–00FF)

Latin Extended-A (0100–017F)

Latin Extended-B (0180–024F)

IPA Extensions (0250–02AF)

Spacing Modifier Letters (02B0–02FF)

Combining Diacritical Marks (0300–036F)

Greek and Coptic (0370–03FF)

Cyrillic (0400–04FF)

Cyrillic Supplement (0500–052F)

Armenian (0530–058F)

Aramaic Scripts:

    Hebrew (0590–05FF)

    Arabic (0600–06FF)

    Syriac (0700–074F)

    Arabic Supplement (0750–077F)

    Thaana (0780–07BF)

    N'Ko (07C0–07FF)

In UTF-8, everything over U+0800 requires > 2 bytes. Am I misunderstanding something? It's possible.