Skip to content

Conversation

@eggrobin
Copy link
Member

@eggrobin eggrobin commented Oct 21, 2024

L2/24-232, Unicode request for compound tone diacritics III

[UTC-181-C34] Consensus: Provisionally assign 7 code points U+1ADE..U+1ADF and U+1AEC..U+1AF0 to combining diacritical marks as described in L2/24-232. [Ref: 2.7 in L2/24-228]

[185-C40] Consensus: UTC accepts for encoding in Unicode 18.0 the following 321 Arabic, Armenian, Bengali, Cuneiform, Devanagari, Hebrew, Kana, Khitan, Latin, Mongolian, Phonetic and other symbol characters for which code points have previously been assigned:

  1. Arabic (39 characters—ref. 180-C22, 180-C26): 10EC9..10ECF, 10ED9..10EEE, 10EF0..10EF9
  2. Armenian (3 characters—ref. 179-C46): 0558, 058B..058C
  3. Bengali (1 character—ref. 180-C30): 0984
  4. Cuneiform numerals (12 characters—ref. 182-C3): 1246F, 12475..1247F
  5. Devanagari (1 character—ref. 182-C5): 11B0A
  6. Hebrew (1 character—ref. 182-C4): 05C8
  7. Kana (7 characters—ref. 180-C6, 182-C31, 183-C54, 184-C38): 1B123..1B125, 1B126, 1B127..1B128, 1B168
  8. Khitan (5 characters—ref. 184-C5): 18CD6..18CDA
  9. Latin (54 characters—ref. 181-C8, 181-C10, 182-C6, 182-C7, 182-C8, 182-C9, 183-C8): 2E60..2E63, A7DD, A7E2, AB6C..AB6D, 1DF57..1DF59, 1DF5A..1DF66, 1DF67, 1DF68..1DF81, 1DFCD..1DFCF
  10. Mongolian (1 character—ref. 178-C30): 1879
  11. Phonetic (114 characters—ref. 179-C55, 179-C59, 179-C60, 180-C32, 180-C33, 180-C34, 180-C35, 180-C36, 180-C37, 181-C33, 181-C34, 181-C35, 181-C36, 181-C45, 183-C10): 1ADE..1ADF, 1AEC..1AF0, 208F, 209D..209F, 107BB..107BF, 1DF1F..1DF24, 1DF2B..1DF2C, 1DF2D..1DF3A, 1DF3B..1DF3D, 1DF3E..1DF3F, 1DF40..1DF56, 1DFD0, 1DFD1, 1DFD2..1DFD7, 1DFD8..1DFE8, 1DFE9..1DFF2, 1DFF3..1DFF4, 1DFF5..1DFF9, 1DFFA..1DFFF
  12. Symbols (81 characters—ref. 178-C31, 178-C36, 178-C37, 180-C38, 180-C39, 180-C40, 181-C38, 181-C39, 181-C40, 182-C10, 182-C11, 183-C12, 183-C13, 184-C18): 20C2, 1CEF1..1CEF5, 1D127..1D128, 1D1EB..1D1F6, 1D1F7..1D1FE, 1D1FF, 1D250..1D255, 1D256..1D25A, 1D25B..1D25F, 1D260, 1D261, 1D262..1D27F, 1D280..1D281, 1F1AE, 1F7DA
  13. Tangut (2 characters—ref. 183-C7, 184-C4: 18D1F..18D20

@eggrobin eggrobin marked this pull request as ready for review November 21, 2025 19:18
@eggrobin eggrobin requested a review from markusicu November 21, 2025 19:18
Comment on lines +779 to +781
1ABF..1AEA ; CM # Mn [44] COMBINING LATIN SMALL LETTER W BELOW..COMBINING UPWARDS ARROW ABOVE
1AEB ; GL # Mn COMBINING DOUBLE RIGHTWARDS ARROW ABOVE
1AEC..1AF0 ; CM # Mn [5] COMBINING CARON-ACUTE..COMBINING DOUBLE COMMA ABOVE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When they span two base characters, they are GL. When they just go over one base character, they are CM. The naming intersection is a little unfortunate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, and looking at L2/24-232 the two new double accents below go under just one base character. Thanks!

@markusicu
Copy link
Member

Should we add the NamesList.txt annotations that L2/24-232 proposes? Or leave those to @Ken-Whistler ?

1AF0 COMBINING DOUBLE COMMA ABOVE
→ 0313 COMBINING COMMA ABOVE
→ 02EE MODIFIER LETTER DOUBLE APOSTROPHE

@eggrobin
Copy link
Member Author

Should we add the NamesList.txt annotations that L2/24-232 proposes? Or leave those to @Ken-Whistler ?

We leave the names list to @Ken-Whistler.

Comment on lines +779 to +781
1ABF..1AEA ; CM # Mn [44] COMBINING LATIN SMALL LETTER W BELOW..COMBINING UPWARDS ARROW ABOVE
1AEB ; GL # Mn COMBINING DOUBLE RIGHTWARDS ARROW ABOVE
1AEC..1AF0 ; CM # Mn [5] COMBINING CARON-ACUTE..COMBINING DOUBLE COMMA ABOVE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, and looking at L2/24-232 the two new double accents below go under just one base character. Thanks!

@eggrobin eggrobin merged commit 9109e95 into unicode-org:main Nov 21, 2025
19 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants