Skip to content

Support Unicode in "Name" Token #68

@DER-SSt

Description

@DER-SSt

Is your feature request related to a problem? Please describe.

Current Definition of "Name" Token

Only letters a-z are allowed.

token Name =
( 'a'..'z' | 'A'..'Z' | '_' | '$' )
( 'a'..'z' | 'A'..'Z' | '_' | '0'..'9' | '$' )*;

Limitation for UML-Languages

The name token is used in nearly all monticore-languages. The restrictions of the name make it harder for users to describe their problem in their language.

e.g. CDs:

class Käse {      // "ä" not allowed
  bool flüßig;   // "ü" and "ß"
}

or ODs:

object Époisses: Käse {    // "É"
  flüßig = false; 
}

and so on.

Limitation for General Languages

Other languages have a much broader definition of names. A monticore-grammar for these languages is either more restrictive and cannot parse all valid instances, or it redefines the name token and is hard to use with other monticore-languages.

Java:

https://docs.oracle.com/javase/specs/jls/se23/html/jls-3.html#jls-3.8

Letters and digits may be drawn from the entire Unicode character set, which supports most writing scripts in use in the world today, including the large sets for Chinese, Japanese, and Korean. This allows programmers to use identifiers in their programs that are written in their native languages.

αρετη is explicitly mentioned in the java specification as an allowed identifier.

XML

XML also allows unicode-characters in the identifier. As a consequence, the MontiCore-XML Language Overrides the name token:
https://github.com/MontiCore/xml/blob/ed432849540eab55c952aabfa748b923c541b55c/src/main/grammars/de/monticore/lang/XMLBasis.mc4#L22-L47

Describe the solution you'd like?

Allow Unicode-Characters for name token in MCBasis.mc4. This allows the developer to create models closer to her native language, and ensures that general languages such as Java & XML can be parsed without overwriting the name token.

There is a unicode-identifier standard, which can serve as a language-independent basis: https://www.unicode.org/reports/tr31/

Java-RTE also knows the unicode-identifier standard: https://docs.oracle.com/en/java/javase/23/docs/api/java.base/java/lang/Character.html#isUnicodeIdentifierStart(int)

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions