-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ambiguities in parsing Sinhala named sequences (repaya, Yansaya and Rakararnasaya) #6
Comments
Directly defining a set of prioritized parsing rules can be helpful to eliminate the issue of implementations (because the major implementation today, OpenType, is based on prioritized parsing stages) having to interpret implied logic for marginal cases. (And the easier-to-understand version like “default parse of …” can always be supplied together.)
A typo of swapping. Should be “Ra + Rakaaraansaya”. |
Fixed |
I don't want to create unnecessary noise, but I am interested in tracking the progress on this topic. It looks like the proposed fixes haven't (yet) appeared in Unicode or the MS docs themselves. But is that even intended to be the path "forward"? Would getting positive feedback from the Local Languages Working Group be sufficient to say "this is how it should be?" |
@lianghai Is using ZWNJ to encode above 1b a sensible solution? So the Unicode string to display Ra+Yansaya would be as follwoing;
Dose this mean enabling parsing |
There are three Unicode named character sequences for Sinhala defined in Unicode Standard 6.1.0. Due to lack of explicit definitions there are inconstancies in Opentype shaping. However these inconsistencies lead to actual errors or users assuming there are errors with Sinhala Unicode Specification, fonts, keyboards and input methods. I am consolidating some of my findings and ideas here.
Three Sinhala named sequences for Sinhala added in Unicode Standard 6.1.0;
To visualise this;
These are not included in the Core Specification at the moment and there are ambiguities in how to parse these.
The string has two possibilities to parse
0DBB 0DCA 200D 0DBA
( ර + ් + zwj + ය )NOTE:* Added space (0020) between ර and ් for demonstration.
Similarly,
The string
0DBB 0DCA 200D 0DBB
(ර + ් + zwj + ර) could be parse as both;NOTE:** Added space (0020) between ර and ් for demonstration.
NOTE: The syllable r-ra is not a common occurrence in Sinhala.
Above named sequences are not in The named sequences There is no explicit description of how these two strings should be dealt with in the (proposal) the SLS 1134:2011 (2011 revision) specification.
However SLS 1134:2011 Section 5.9, p22 on Repaya has following explanation ;
NOTE: Screenshot from the PDF to avoid Sinhala string display errors
The wording and the example dose not solve the issue of ambiguity explicitly because it refers to another special case with the visual form ‘yansaya with a repaya’.
We could consider that SLS 1134:2011 Section 5.9 implies that
0DBB 0DCA 200D 0DBA
should be parsed as1a
and also consider linguistically Ra+Yanasaya is incorrect and make1a
the default parse. However, if we update specification to make1a
the default parse it raises another issue; how do we encode Ra+Yansaya combination? We need to display ’things that should not exist’ or ‘incorrect strings‘ for linguistic or technical contexts.When it comes to
0DBB 0DCA 200D 0DBB
(ර + ් + zwj + ර) there are no references in SLS 1134 . However2b
might be desirable for the default parse against the2a
. Both the R-Ra and Ra-Ra (ie. ක්රර) are not practically common syllables and both (ර්ර) (්රර) are not common occurrences. But that is up to the linguists to decide.Following is a solution;
A. Define default parse of
0DBB 0DCA 200D 0DBA
( ර + ් + zwj + ය ) as the Repaya + Ya (above1a
)B. Define a way to encode Ra+Yanasaya
C. Define a default parse for
0DBB 0DCA 200D 0DBB
(ර + ් + zwj + ර) and use the same strategy as B to encode the other possibility. Which one is the default is a question that we can ask Local Languages Working Group of Sri Lanka.D. Update Harfbuzz to new spec
F. Update all fonts spec
Related links and references
The text was updated successfully, but these errors were encountered: