Skip to content
This repository has been archived by the owner on May 17, 2023. It is now read-only.

Commit

Permalink
Changes in response to S. Farrell comments
Browse files Browse the repository at this point in the history
Clarifying language on rules, collision checking and well-formed
documents, incl. required ordering.
  • Loading branch information
asmusf committed Apr 28, 2016
1 parent 9838ed5 commit e7aa58f
Showing 1 changed file with 52 additions and 27 deletions.
79 changes: 52 additions & 27 deletions draft-ietf-lager-specification.xml
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,8 @@

<section title="LGR Format">

<t>An LGR is expressed as a well-formed XML Document <xref target="XML"/>.</t>
<t>An LGR is expressed as a well-formed XML Document <xref target="XML"/>
that conforms to the schema defined in <xref target="schema"/>.</t>

<t>As XML is case-sensitive, an LGR must be authored with the correct
casing. For example, the XML element names must be in lower
Expand Down Expand Up @@ -221,6 +222,10 @@
contain zero or one "meta" element, exactly one "data" element, and
zero or one "rules" element; and these three elements MUST be in that order.</t>

<t>Most elements that are direct or nested child elements of the "rules" element
MUST be placed in a specific relative order to other elements for the LGR to be valid.
An LGR that violates these constraints MUST be rejected. </t>

<t>In the following descriptions, required, non-repeating elements or attributes are
generally not called out explicitly, in contrast to "OPTIONAL" ones,
or those that "MAY" be repeated. For attributes that take lists as values, the elements MUST be
Expand Down Expand Up @@ -443,7 +448,9 @@
<t>
<figure>
<artwork><![CDATA[ <references>
<reference id="0">The Unicode Standard, Version 7.0</reference>
<reference id="0">The Unicode Consortium. The Unicode Standard, Version 8.0.0,
(Mountain View, CA: The Unicode Consortium, 2015. ISBN 978-1-936213-10-8)
http://www.unicode.org/versions/Unicode8.0.0/</reference>
<reference id="1">Big-5: Computer Chinese Glyph and Character
Code Mapping Table, Technical Report C-26, 1984</reference>
<reference id="2" comment="synchronized with Unicode 6.1">
Expand Down Expand Up @@ -477,7 +484,7 @@

<t>The code point data is collected within the "data" element. Within this element, a
series of "char" and "range" elements describe eligible code points, or ranges of
code points, respectively.</t>
code points, respectively. Collectively, these are known as the repertoire.</t>

<t>Discrete permissible code points or code point sequences (see
<xref target="sequences" />) are declared with a "char"
Expand Down Expand Up @@ -670,7 +677,9 @@
</t>
<t>Variant relations are normally not only symmetric, but also transitive.
If A is a variant of B and B is a variant of C, then A is also a variant of C.
As with symmetry, these transitive relations are spelled out explicitly in the LGR.</t>
As with symmetry, these transitive relations are only part of the LGR if
spelled out explicitly. Implementations that require an LGR to be symmetric
and transitive should verify this mechanically.</t>

<t>All variant mappings are unique. For a given "char" element all "var" elements
MUST have a unique combination of "cp", "when" and "not-when" attributes.
Expand Down Expand Up @@ -968,15 +977,23 @@
<section title="Whole Label and Context Evaluation">

<section title="Basic Concepts">
<t>The code points in a label sometimes need to satisfy context-based rules, for
example for the label to be considered valid, or to satisfy the context for a
variant mapping (see the description of the "when" attribute in <xref
target="parameterized_context_rule"/>).</t>
<t>The "rules" element contains the specification of both context-based and whole
Whole Label Evaluation (WLE) rules (<xref target="whole_label" />), the character
classes (<xref target="character_classes" />) that they depend on
and any actions (<xref target="actions"/>) that assign dispositions to labels
based on rules or variant mappings.</t>

<t>A Whole Label Evaluation rule (WLE) is applied to the whole label. It is used to
validate both original labels and variant labels computed from them using a
permutation over all applicable variant mappings. A conditional context rule is
a specialized form of WLE specific to the context around a single code point or
code point sequence. For example, if a rule is referenced in the "when"
validate both original labels and any variant labels computed from them. </t>

<t>A conditional context rule does not necessarily
apply to the whole label, but may be specific to the context around a single code
point or code point sequence. Certain code points in a label sometimes need to
satisfy context-based rules, for example for the label to be considered valid, or
to satisfy the context for a variant mapping (see the description of the "when"
attribute in <xref target="parameterized_context_rule"/>). </t>

<t>For example, if a rule is referenced in the "when"
attribute of a variant mapping it is used to describe the conditional context
under which the particular variant mapping is defined to exist.</t>

Expand All @@ -999,7 +1016,7 @@
all of the constraints defined here are validated by the schema.</t>
</section>

<section title="Character Classes">
<section title="Character Classes" anchor="character_classes">
<t>Character classes are sets of characters that often share a particular property.
While they function like sets in every way, even supporting the usual set
operators, they are called character classes here in a nod to the use of that
Expand Down Expand Up @@ -2071,7 +2088,8 @@
</list>
</t>
<t>The number of potential permutations can be very large. In practice, implementations
would use suitable optimizations to avoid having to actually create all permutations.</t>
would use suitable optimizations to avoid having to actually create all permutations
(see <xref target="collision" />). </t>

<t>In determining the permuted set of variant labels in step (1) above, all eligible
partitions into sequences must be evaluated. A label "ab" that matches a sequence "ab"
Expand Down Expand Up @@ -2179,19 +2197,25 @@

<t>Because of symmetry and transitivity, all variant mappings form disjoint sets.
In each of these sets, the source and target of each mapping are also variants
of the sources and targets of all the other mappings. As a consequence, if two labels
have code points at the same position from two different of these variant mapping sets,
the sets of their variant labels are likewise disjoint.</t>

<t>Instead of generating all permutations, that is, using each variant mapping in each
set at a particular code position in the label, it is sufficient to substitute an "index" mapping,
in effect identifying the set of variant code points for that position. Such an index mapping
could be, for example, the variant mapping for which the target code point (or sequence)
comes first in some sorting order.</t>
of the sources and targets of all the other mappings. However, members of
two different sets are never variants of each other.</t>

<t>If two labels have code points at the same position that are members of two
different of these variant mapping sets, any variant labels of one, cannot be
variant labels of the other: the sets of their variant labels are likewise disjoint.
Instead of generating all permutations to compare all possible variants, it is
enough to find out whether code points at the same position belong to the
same variant set or not.</t>

<t>For that, it is sufficient to substitute an "index" mapping that identifies the
set. This index mapping could be, for
example, the variant mapping for which the target code point (or sequence)
comes first in some sorting order. This index mapping would, in effect, identify
the set of variant mappings for that position. </t>

<t>To check collision then means generating a single variant label from the original
by substituting the "index" value as the target for mapping from any code
point. This results in an "index label". Two labels collide whenever the index labels
by substituting the respective "index" value for each code point. This results in an
"index label". Two labels collide whenever the index labels
for them are the same.</t>
</section>

Expand Down Expand Up @@ -2955,7 +2979,7 @@ U+6F27;U+4E7E;U+6F27;U+4E81,U+5E72,U+5E79,U+69A6]]></artwork>
</t>
</section>

<section title="RelaxNG Compact Schema">
<section title="RelaxNG Compact Schema" anchor="schema">
<figure>
<artwork><![CDATA[
<CODE BEGINS>
Expand Down Expand Up @@ -3152,7 +3176,8 @@ U+6F27;U+4E7E;U+6F27;U+4E81,U+5E72,U+5E79,U+69A6]]></artwork>
<list style="hanging" hangIndent="5">
<t hangText="draft-ietf-lager-specification-12">
Integrate additional feedback from AD review. Use domain names for the prefixes
in private dispositions to reduce potential conflicts.
in private dispositions to reduce potential conflicts. Add clarifying language on
ordering, well-formedness, collision checking and rules.
</t>
</list>

Expand Down

0 comments on commit e7aa58f

Please sign in to comment.