Iŋliʃ fɷnotipic ɑlfɑbet #1035

eggrobin · 2025-02-11T16:53:07Z

[182-C7] Consensus: Provisionally assign 31 code points U+2E60..U+2E63, U+A7DD and U+1DF68..U+1DF81, in the Supplemental Punctuation, Latin Extended-D and Latin Extended-G blocks, to characters for EPA with names and code points as described in Section 2.2 of L2/24-277. [Ref. 1.6 in L2/25-010]

[185-C40] Consensus: UTC accepts for encoding in Unicode 18.0 the following 321 Arabic, Armenian, Bengali, Cuneiform, Devanagari, Hebrew, Kana, Khitan, Latin, Mongolian, Phonetic and other symbol characters for which code points have previously been assigned:

Arabic (39 characters—ref. 180-C22, 180-C26): 10EC9..10ECF, 10ED9..10EEE, 10EF0..10EF9
Armenian (3 characters—ref. 179-C46): 0558, 058B..058C
Bengali (1 character—ref. 180-C30): 0984
Cuneiform numerals (12 characters—ref. 182-C3): 1246F, 12475..1247F
Devanagari (1 character—ref. 182-C5): 11B0A
Hebrew (1 character—ref. 182-C4): 05C8
Kana (7 characters—ref. 180-C6, 182-C31, 183-C54, 184-C38): 1B123..1B125, 1B126, 1B127..1B128, 1B168
Khitan (5 characters—ref. 184-C5): 18CD6..18CDA
Latin (54 characters—ref. 181-C8, 181-C10, 182-C6, 182-C7, 182-C8, 182-C9, 183-C8): 2E60..2E63, A7DD, A7E2, AB6C..AB6D, 1DF57..1DF59, 1DF5A..1DF66, 1DF67, 1DF68..1DF81, 1DFCD..1DFCF
Mongolian (1 character—ref. 178-C30): 1879
Phonetic (114 characters—ref. 179-C55, 179-C59, 179-C60, 180-C32, 180-C33, 180-C34, 180-C35, 180-C36, 180-C37, 181-C33, 181-C34, 181-C35, 181-C36, 181-C45, 183-C10): 1ADE..1ADF, 1AEC..1AF0, 208F, 209D..209F, 107BB..107BF, 1DF1F..1DF24, 1DF2B..1DF2C, 1DF2D..1DF3A, 1DF3B..1DF3D, 1DF3E..1DF3F, 1DF40..1DF56, 1DFD0, 1DFD1, 1DFD2..1DFD7, 1DFD8..1DFE8, 1DFE9..1DFF2, 1DFF3..1DFF4, 1DFF5..1DFF9, 1DFFA..1DFFF
Symbols (81 characters—ref. 178-C31, 178-C36, 178-C37, 180-C38, 180-C39, 180-C40, 181-C38, 181-C39, 181-C40, 182-C10, 182-C11, 183-C12, 183-C13, 184-C18): 20C2, 1CEF1..1CEF5, 1D127..1D128, 1D1EB..1D1F6, 1D1F7..1D1FE, 1D1FF, 1D250..1D255, 1D256..1D25A, 1D25B..1D25F, 1D260, 1D261, 1D262..1D27F, 1D280..1D281, 1F1AE, 1F7DA
Tangut (2 characters—ref. 183-C7, 184-C4: 18D1F..18D20

unicode-org/sah#456
https://github.com/unicode-org/utc-release-management/issues/179

…P for U+2E63.

kirkrmiller · 2025-11-07T22:18:33Z

Do pull requests need to specify when characters should be added the PropList file for the soft_dotted property? Three of the characters here do: U+1DF6F, U+1DF70 and U+1DF71.

eggrobin · 2025-11-07T22:59:36Z

Good catch. I will fix that…

eggrobin · 2025-11-11T14:00:41Z

@markusicu We had reported on the properties of these characters in L2/25-087, pp. 14 sq., but nobody had spotted the soft dotted issue; ideally I would like to include a note about the updated comparisons in the report to UTC-186, but I am not sure how to do that (I guess I could reopen the SAH issue, but that seems a bit over the top).

What do you think?

markusicu · 2025-11-14T22:55:19Z

unicodetools/data/ucd/dev/UnicodeData.txt

+1DF68;LATIN CAPITAL LETTER PHONOTYPIC A WITH SWASH;Lu;0;L;;;;;N;;;;1DF69;
+1DF69;LATIN SMALL LETTER PHONOTYPIC A WITH SWASH;Ll;0;L;;;;;N;;;1DF68;;1DF68
+1DF6A;LATIN CAPITAL LETTER PHONOTYPIC ROUNDTOP A;Lu;0;L;;;;;N;;;;1DF6B;
+1DF6B;LATIN SMALL LETTER PHONOTYPIC ROUNDTOP A;Ll;0;L;;;;;N;;;1DF6A;;1DF6A


Too bad that we missed early on that we have another range of characters with alternating small & capital letters :-(

@roozbehp @Ken-Whistler @PeterConstable FYI

markusicu · 2025-11-14T22:59:09Z

unicodetools/data/ucd/dev/UnicodeData.txt

+1DF80;LATIN CAPITAL LETTER A WITH TOPBAR;Lo;0;L;;;;;N;;;;;
+1DF81;LATIN CAPITAL LETTER E WITH BENT TOPBAR;Lo;0;L;;;;;N;;;;;


It seems weird that the uppercase-only letters are gc=Lo while the lowercase-only letters (1DF70, 1DF71) are gc=Ll, but I see in https://github.com/unicode-org/sah/issues/456 that we discussed this...

It feels like they should at least be Other_Uppercase (and thus also Cased).

@Ken-Whistler @macchiati @PeterConstable FYI

seeing no response, but I don't want to just forget about this:

https://github.com/unicode-org/properties/issues/497

markusicu · 2025-11-14T23:14:37Z

unicodetools/data/ucd/dev/extracted/DerivedGeneralCategory.txt

 2DD7          ; Cn #       <reserved-2DD7>
 2DDF          ; Cn #       <reserved-2DDF>
-2E5E..2E7F    ; Cn #  [34] <reserved-2E5E>..<reserved-2E7F>
+2E5E..2E5F    ; Cn #   [2] <reserved-2E5E>..<reserved-2E5F>


FYI: I think we should stop printing gc=Cn lines here.
PVA.txt does have

# @missing: 0000..10FFFF; General_Category; Unassigned

Unless you disagree, I can create a PAG issue.

https://github.com/unicode-org/properties/issues/498

markusicu · 2025-11-14T23:20:15Z

unicodetools/data/ucd/dev/PropList.txt

+2E60..2E61    ; Pattern_Syntax # Po   [2] WIGGLY EXCLAMATION MARK..INVERTED WIGGLY EXCLAMATION MARK
+2E62          ; Pattern_Syntax # Ps       LEFT PARENTHESIS WITH MIDDLE RING
+2E63          ; Pattern_Syntax # Pe       RIGHT PARENTHESIS WITH MIDDLE RING


nifty new Pattern_Syntax ;-)

markusicu · 2025-11-29T03:08:08Z

CI check failures

Error:    TestTestUnicodeInvariants.testAdditionComparisons:64
 TestUnicodeInvariants.testInvariants(addition-comparisons) failed ==>
 expected: <0> but was: <1>

Error:    TestVersionedSymbolTable.testIdentityAndNullQueries:174
 Expected \p{Bidi_Paired_Bracket_Type=None} ⊇ \p{Bidi_Paired_Bracket=@none@} =
 [^()\[\]\{\}\u0F3A-\u0F3D\u169B\u169C\u2045\u2046\u207D\u207E\u208D\u208E\u2308-\u230B\u2329\u232A\u2768-\u2775\u27C5\u27C6\u27E6-\u27EF\u2983-\u2998\u29D8-\u29DB\u29FC\u29FD\u2E22-\u2E29\u2E55-\u2E5C\u3008-\u3011\u3014-\u301B\uFE59-\uFE5E\uFF08\uFF09\uFF3B\uFF3D\uFF5B\uFF5D\uFF5F\uFF60\uFF62\uFF63]
 but \p{Bidi_Paired_Bracket=@none@}
 contains unexpected [\u2E62\u2E63] ==>
 expected: <true> but was: <false>

markusicu · 2025-11-29T03:24:10Z

@markusicu We had reported on the properties of these characters in L2/25-087, pp. 14 sq., but nobody had spotted the soft dotted issue; ideally I would like to include a note about the updated comparisons in the report to UTC-186, but I am not sure how to do that (I guess I could reopen the SAH issue, but that seems a bit over the top).

What do you think?

I created https://github.com/unicode-org/properties/issues/496

markusicu · 2025-11-29T03:37:58Z

@markusicu We had reported on the properties of these characters in L2/25-087, pp. 14 sq., but nobody had spotted the soft dotted issue; ideally I would like to include a note about the updated comparisons in the report to UTC-186, but I am not sure how to do that (I guess I could reopen the SAH issue, but that seems a bit over the top).
What do you think?

I created unicode-org/properties#496

The proposal L2/24-277 did include:

The following characters have to get the “soft-dotted” property:

U+1DF6F LATIN SMALL LETTER PHONOTYPIC DIPHTHONG AI
U+1DF70 LATIN SMALL LETTER I WITH PIGTAIL AT BOTTOM
U+1DF71 LATIN SMALL LETTER STRETCHED I

... so maybe we should just give them Soft_Dotted and move on?

eggrobin · 2025-11-29T11:43:56Z

... so maybe we should just give them Soft_Dotted and move on?

We definitely should, and I do not think we should block this PR on PAG rubberstamping this soft-dottedness. But since an earlier PAG report said something wrong about these characters, it might be useful to correct the record in the next PAG report. I will fill in https://github.com/unicode-org/properties/issues/496 accordingly.

eggrobin · 2025-11-29T14:23:39Z

Error:    TestVersionedSymbolTable.testIdentityAndNullQueries:174
 Expected \p{Bidi_Paired_Bracket_Type=None} ⊇ \p{Bidi_Paired_Bracket=@none@} =
 [^()\[\]\{\}\u0F3A-\u0F3D\u169B\u169C\u2045\u2046\u207D\u207E\u208D\u208E\u2308-\u230B\u2329\u232A\u2768-\u2775\u27C5\u27C6\u27E6-\u27EF\u2983-\u2998\u29D8-\u29DB\u29FC\u29FD\u2E22-\u2E29\u2E55-\u2E5C\u3008-\u3011\u3014-\u301B\uFE59-\uFE5E\uFF08\uFF09\uFF3B\uFF3D\uFF5B\uFF5D\uFF5F\uFF60\uFF62\uFF63]
 but \p{Bidi_Paired_Bracket=@none@}
 contains unexpected [\u2E62\u2E63] ==>
 expected: <true> but was: <false>

… I am confused by this one.

eggrobin · 2025-11-29T15:12:33Z

… I am confused by this one.

On my machine it works!?

markusicu · 2025-11-29T16:34:44Z

Any chance that the server sees the Unicode 17 version of Bidi_Paired_Bracket?

eggrobin · 2025-11-29T16:36:53Z

Maybe, but how?

markusicu · 2025-11-29T16:58:20Z

maybe comment out this one test and move on for now?

eggrobin · 2025-11-29T17:57:03Z

It was yet another static cache (this time of the set of unassigned code points)… 😩

From 2011, with this comment:

     * Reset the cache properties. Must be done if the version of Unicode is different than the ICU one, AND any UnicodeProperty has already been instantiated.
     * TODO make this a bit more robust.

Making this a bit more robust sure would have been nice…

Revert "CI is haunted" This reverts commit 6e9b5f5. Revert "moo" This reverts commit 2eec56c. Revert "meow?" This reverts commit 6bff11e. Revert "meow" This reverts commit 35fe8e8. Revert "more traces…" This reverts commit 8a8d5be. Revert "traces" This reverts commit f88af9a.

markusicu · 2025-11-29T18:11:56Z

Poor you... but thanks for chasing this down!

Maybe setDefaultXSymbolTable() should just call ResetCacheProperties()?

eggrobin · 2025-11-29T18:14:29Z

Maybe setDefaultXSymbolTable() should just call ResetCacheProperties()?

The former lives in ICU, and the latter in the tools, so not really an option.

But I think UnicodeProperty is in dire need of refactoring, and one part of that could be making whatever caching there is correct (or even getting rid of it if it turns out not to be useful).

eggrobin added 18 commits February 11, 2025 14:18

UnicodeData.txt lines from L2/24-277

f1553b9

Typos in the UnicodeData.txt lines

98755ef

Another typo

4a8cd3f

lb assignments according to the proposal, note lb=CL rather than lb=C…

2d4a7be

…P for U+2E63.

Latin letters, Common punctuation

a9986cc

Regenerate UCD

de5edbc

Failing test for the case pairs

d06adf7

Another typo in the UnicodeData lines

27ff4e1

up to block

0ea309e

Regenerate UCD

253fdb7

Test the unpaired lowercase letters

32b0a68

Test ɷ

0ed1ae8

failing test for the parentheses

3df69cd

bpb bmg

0d55f00

Failing test for the exclamation marks

f42e1f3

Terminal wigglies

8c7b347

Regenerate UCD

c9eac30

Test passes

7e468f3

eggrobin added data-for-new pipeline-provisionally-assigned labels Feb 11, 2025

eggrobin added 4 commits February 11, 2025 19:02

Merge remote-tracking branch 'la-vache/main' into EPA

a921007

Lo and compare them to ꟻ

ab6cdab

Regenerate UCD

b923d3d

Ignore Block

bf767ac

eggrobin added the ucd-δ-needs-revision have data, but UTC approved changes that need to be made label Nov 7, 2025

eggrobin added 3 commits November 11, 2025 14:04

Merge remote-tracking branch 'la-vache/main' into EPA

35792be

More apt (and failing) comparison for the i-like letters

ad196d1

Soften the dots

c4f12f4

eggrobin added the pipeline-18.0 label Nov 11, 2025

eggrobin marked this pull request as ready for review November 11, 2025 13:57

eggrobin requested a review from markusicu November 11, 2025 14:00

markusicu reviewed Nov 14, 2025

View reviewed changes

Merge remote-tracking branch 'la-vache/main' into EPA

1667590

eggrobin requested a review from markusicu November 29, 2025 01:46

Ignore 𝼚’s Do_Not_Emit sequence

b6d5b1a

eggrobin added 2 commits November 29, 2025 17:39

traces

f88af9a

more traces…

8a8d5be

eggrobin added 5 commits November 29, 2025 18:06

meow

35fe8e8

meow?

6bff11e

moo

2eec56c

CI is haunted

6e9b5f5

…

4e885e9

markusicu approved these changes Nov 29, 2025

View reviewed changes

eggrobin merged commit 7d736b1 into unicode-org:main Nov 29, 2025
15 of 16 checks passed

		1DF80;LATIN CAPITAL LETTER A WITH TOPBAR;Lo;0;L;;;;;N;;;;;
		1DF81;LATIN CAPITAL LETTER E WITH BENT TOPBAR;Lo;0;L;;;;;N;;;;;

Uh oh!

Iŋliʃ fɷnotipic ɑlfɑbet #1035

Iŋliʃ fɷnotipic ɑlfɑbet #1035

Uh oh!

Conversation

eggrobin commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kirkrmiller commented Nov 7, 2025

Uh oh!

eggrobin commented Nov 7, 2025

Uh oh!

eggrobin commented Nov 11, 2025

Uh oh!

markusicu Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

markusicu Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

markusicu Nov 29, 2025

Choose a reason for hiding this comment

Uh oh!

markusicu Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

markusicu Nov 29, 2025

Choose a reason for hiding this comment

Uh oh!

markusicu Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

markusicu commented Nov 29, 2025

Uh oh!

markusicu commented Nov 29, 2025

Uh oh!

markusicu commented Nov 29, 2025

Uh oh!

eggrobin commented Nov 29, 2025

Uh oh!

eggrobin commented Nov 29, 2025

Uh oh!

eggrobin commented Nov 29, 2025

Uh oh!

markusicu commented Nov 29, 2025

Uh oh!

eggrobin commented Nov 29, 2025

Uh oh!

markusicu commented Nov 29, 2025

Uh oh!

eggrobin commented Nov 29, 2025

Uh oh!

markusicu commented Nov 29, 2025

Uh oh!

eggrobin commented Nov 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

eggrobin commented Feb 11, 2025 •

edited

Loading