-
-
Notifications
You must be signed in to change notification settings - Fork 58
Iŋliʃ fɷnotipic ɑlfɑbet #1035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iŋliʃ fɷnotipic ɑlfɑbet #1035
Conversation
|
Good catch. I will fix that… |
|
@markusicu We had reported on the properties of these characters in L2/25-087, pp. 14 sq., but nobody had spotted the soft dotted issue; ideally I would like to include a note about the updated comparisons in the report to UTC-186, but I am not sure how to do that (I guess I could reopen the SAH issue, but that seems a bit over the top). What do you think? |
| 1DF68;LATIN CAPITAL LETTER PHONOTYPIC A WITH SWASH;Lu;0;L;;;;;N;;;;1DF69; | ||
| 1DF69;LATIN SMALL LETTER PHONOTYPIC A WITH SWASH;Ll;0;L;;;;;N;;;1DF68;;1DF68 | ||
| 1DF6A;LATIN CAPITAL LETTER PHONOTYPIC ROUNDTOP A;Lu;0;L;;;;;N;;;;1DF6B; | ||
| 1DF6B;LATIN SMALL LETTER PHONOTYPIC ROUNDTOP A;Ll;0;L;;;;;N;;;1DF6A;;1DF6A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too bad that we missed early on that we have another range of characters with alternating small & capital letters :-(
| 1DF80;LATIN CAPITAL LETTER A WITH TOPBAR;Lo;0;L;;;;;N;;;;; | ||
| 1DF81;LATIN CAPITAL LETTER E WITH BENT TOPBAR;Lo;0;L;;;;;N;;;;; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems weird that the uppercase-only letters are gc=Lo while the lowercase-only letters (1DF70, 1DF71) are gc=Ll, but I see in https://github.com/unicode-org/sah/issues/456 that we discussed this...
It feels like they should at least be Other_Uppercase (and thus also Cased).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seeing no response, but I don't want to just forget about this:
| 2DD7 ; Cn # <reserved-2DD7> | ||
| 2DDF ; Cn # <reserved-2DDF> | ||
| 2E5E..2E7F ; Cn # [34] <reserved-2E5E>..<reserved-2E7F> | ||
| 2E5E..2E5F ; Cn # [2] <reserved-2E5E>..<reserved-2E5F> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: I think we should stop printing gc=Cn lines here.
PVA.txt does have
# @missing: 0000..10FFFF; General_Category; Unassigned
Unless you disagree, I can create a PAG issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 2E60..2E61 ; Pattern_Syntax # Po [2] WIGGLY EXCLAMATION MARK..INVERTED WIGGLY EXCLAMATION MARK | ||
| 2E62 ; Pattern_Syntax # Ps LEFT PARENTHESIS WITH MIDDLE RING | ||
| 2E63 ; Pattern_Syntax # Pe RIGHT PARENTHESIS WITH MIDDLE RING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nifty new Pattern_Syntax ;-)
|
CI check failures |
I created https://github.com/unicode-org/properties/issues/496 |
The proposal L2/24-277 did include: The following characters have to get the “soft-dotted” property: ... so maybe we should just give them Soft_Dotted and move on? |
We definitely should, and I do not think we should block this PR on PAG rubberstamping this soft-dottedness. But since an earlier PAG report said something wrong about these characters, it might be useful to correct the record in the next PAG report. I will fill in https://github.com/unicode-org/properties/issues/496 accordingly. |
… I am confused by this one. |
On my machine it works!? |
|
Any chance that the server sees the Unicode 17 version of Bidi_Paired_Bracket? |
|
Maybe, but how? |
|
maybe comment out this one test and move on for now? |
|
It was yet another static cache (this time of the set of unassigned code points)… 😩 From 2011, with this comment: Making this a bit more robust sure would have been nice… |
|
Poor you... but thanks for chasing this down! Maybe setDefaultXSymbolTable() should just call ResetCacheProperties()? |
The former lives in ICU, and the latter in the tools, so not really an option. But I think UnicodeProperty is in dire need of refactoring, and one part of that could be making whatever caching there is correct (or even getting rid of it if it turns out not to be useful). |
[182-C7] Consensus: Provisionally assign 31 code points U+2E60..U+2E63, U+A7DD and U+1DF68..U+1DF81, in the Supplemental Punctuation, Latin Extended-D and Latin Extended-G blocks, to characters for EPA with names and code points as described in Section 2.2 of L2/24-277. [Ref. 1.6 in L2/25-010]
[185-C40] Consensus: UTC accepts for encoding in Unicode 18.0 the following 321 Arabic, Armenian, Bengali, Cuneiform, Devanagari, Hebrew, Kana, Khitan, Latin, Mongolian, Phonetic and other symbol characters for which code points have previously been assigned: