From 5c3fafd372117c9dd1cf807a209f9a57e152adcb Mon Sep 17 00:00:00 2001 From: Justin Lu Date: Mon, 20 Oct 2025 15:50:48 -0700 Subject: [PATCH 1/3] init --- .../share/classes/java/util/Locale.java | 43 +++++++++++++------ 1 file changed, 31 insertions(+), 12 deletions(-) diff --git a/src/java.base/share/classes/java/util/Locale.java b/src/java.base/share/classes/java/util/Locale.java index a55ddee648e46..cb8149a08637e 100644 --- a/src/java.base/share/classes/java/util/Locale.java +++ b/src/java.base/share/classes/java/util/Locale.java @@ -204,15 +204,18 @@ * key="x"/value="java-1-7" * * - * BCP 47 deviation: Although BCP 47 requires field values to be registered - * in the IANA Language Subtag Registry, the {@code Locale} class - * does not validate this requirement. For example, the variant code "foobar" - * is well-formed since it is composed of 5 to 8 alphanumerics, but is not defined - * the IANA Language Subtag Registry. The {@link Builder} - * only checks if an individual field satisfies the syntactic - * requirement (is well-formed), but does not validate the value - * itself. Conversely, {@link #of(String, String, String) Locale::of} and its - * overloads do not make any syntactic checks on the input. + * BCP 47 deviation: BCP47 defines the following two levels of + * conformance, + * "valid" and "well-formed". A valid tag requires that it is well-formed, its + * subtag values are registered in the IANA Language Subtag Registry, and it does not + * contain duplicate variant or extension singleton subtags. The {@code Locale} + * class does not enforce that subtags are registered in the Subtag Registry. + * {@link Builder} only checks if an individual field satisfies the syntactic + * requirement (is well-formed). When passed duplicate variants, {@code Builder} + * accepts and includes them. When passed duplicate extension singletons, {@code + * Builder} accepts but ignores the duplicate key and its associated value. + * Conversely, {@link #of(String, String, String) Locale::of} and its + * overloads do not check if the input is well-formed at all. * *

Unicode BCP 47 U Extension

* @@ -246,7 +249,10 @@ * can be empty, or a series of subtags 3-8 alphanums in length). A * well-formed locale attribute has the form * {@code [0-9a-zA-Z]{3,8}} (it is a single subtag with the same - * form as a locale type subtag). + * form as a locale type subtag). {@code Locale} does not enforce uniqueness of + * locale keys nor attributes. For methods in {@code Locale} and {@code Locale.Builder} + * that accept extensions, occurrences of duplicate locale attributes as well + * as locale keys and their associated type are accepted but ignored. * *

The Unicode locale extension specifies optional behavior in * locale-sensitive services. Although the LDML specification defines @@ -1743,6 +1749,12 @@ public static String caseFoldLanguageTag(String languageTag) { * to {@link Locale.Builder#setLanguageTag(String)} which throws an exception * in this case. * + *

Duplicate variants are accepted and included by the builder. + * However, duplicate extension singleton keys and their associated type + * are accepted but ignored. The same behavior applies to duplicate locale + * keys and attributes within a U extension. Note that subsequent subtags after + * the occurrence of a duplicate are not ignored. + * *

The following conversions are performed: