Merge pull request #1634 from chorman0773/spec-add-identifiers-macro-ambiguity

traviscross · web-flow · commit 889a660f0d1b · 2024-10-21T16:40:48.000Z
Add identifier syntax to macro-ambiguity.md
diff --git a/src/macro-ambiguity.md b/src/macro-ambiguity.md
@@ -1,11 +1,16 @@
 # Appendix: Macro Follow-Set Ambiguity Formal Specification
 
+r[macro.ambiguity]
+
 This page documents the formal specification of the follow rules for [Macros
 By Example]. They were originally specified in [RFC 550], from which the bulk
 of this text is copied, and expanded upon in subsequent RFCs.
 
 ## Definitions & Conventions
 
+r[macro.ambiguity.convention]
+
+r[macro.ambiguity.convention.defs]
   - `macro`: anything invokable as `foo!(...)` in source code.
   - `MBE`: macro-by-example, a macro defined by `macro_rules`.
   - `matcher`: the left-hand-side of a rule in a `macro_rules` invocation, or a
@@ -46,11 +51,13 @@ macro_rules! i_am_an_mbe {
 }
 ```
 
+r[macro.ambiguity.convention.matcher]
 `(start $foo:expr $($i:ident),* end)` is a matcher. The whole matcher is a
 delimited sequence (with open- and close-delimiters `(` and `)`), and `$foo`
 and `$i` are simple NT's with `expr` and `ident` as their respective fragment
 specifiers.
 
+r[macro.ambiguity.convention.complex-nt]
 `$(i:ident),*` is *also* an NT; it is a complex NT that matches a
 comma-separated repetition of identifiers. The `,` is the separator token for
 the complex NT; it occurs in between each pair of elements (if any) of the
@@ -65,16 +72,19 @@ token.
 proper nesting of token tree structure and correct matching of open- and
 close-delimiters.)
 
+r[macro.ambiguity.convention.vars]
 We will tend to use the variable "M" to stand for a matcher, variables "t" and
 "u" for arbitrary individual tokens, and the variables "tt" and "uu" for
 arbitrary token trees. (The use of "tt" does present potential ambiguity with
 its additional role as a fragment specifier; but it will be clear from context
 which interpretation is meant.)
 
+r[macro.ambiguity.convention.set]
 "SEP" will range over separator tokens, "OP" over the repetition operators
 `*`, `+`, and `?`, "OPEN"/"CLOSE" over matching token pairs surrounding a
 delimited sequence (e.g. `[` and `]`).
 
+r[macro.ambiguity.convention.sequence-vars]
 Greek letters "α" "β" "γ" "δ"  stand for potentially empty token-tree sequences.
 (However, the Greek letter "ε" (epsilon) has a special role in the presentation
 and does not stand for a token-tree sequence.)
@@ -101,6 +111,9 @@ purposes of the formalism, we will treat `$v:vis` as actually being
 
 ### The Matcher Invariants
 
+r[macro.ambiguity.invariant]
+
+r[macro.ambiguity.invariant.list]
 To be valid, a matcher must meet the following three invariants. The definitions
 of FIRST and FOLLOW are described later.
 
@@ -112,18 +125,21 @@ of FIRST and FOLLOW are described later.
 1.  For an unseparated complex NT in a matcher, `M = ... $(tt ...) OP ...`, if
     OP = `*` or `+`, we must have FOLLOW(`tt ...`) ⊇ FIRST(`tt ...`).
 
+r[macro.ambiguity.invariant.follow-matcher]
 The first invariant says that whatever actual token that comes after a matcher,
 if any, must be somewhere in the predetermined follow set.  This ensures that a
 legal macro definition will continue to assign the same determination as to
 where `... tt` ends and `uu ...` begins, even as new syntactic forms are added
 to the language.
 
+r[macro.ambiguity.invariant.separated-complex-nt]
 The second invariant says that a separated complex NT must use a separator token
 that is part of the predetermined follow set for the internal contents of the
 NT. This ensures that a legal macro definition will continue to parse an input
 fragment into the same delimited sequence of `tt ...`'s, even as new syntactic
 forms are added to the language.
 
+r[macro.ambiguity.invariant.unseparated-complex-nt]
 The third invariant says that when we have a complex NT that can match two or
 more copies of the same thing with no separation in between, it must be
 permissible for them to be placed next to each other as per the first invariant.
@@ -137,6 +153,9 @@ invalid in a future edition of Rust. See the [tracking issue].**
 
 ### FIRST and FOLLOW, informally
 
+r[macro.ambiguity.sets]
+
+r[macro.ambiguity.sets.intro]
 A given matcher M maps to three sets: FIRST(M), LAST(M) and FOLLOW(M).
 
 Each of the three sets is made up of tokens. FIRST(M) and LAST(M) may also
@@ -145,12 +164,15 @@ can match the empty fragment. (But FOLLOW(M) is always just a set of tokens.)
 
 Informally:
 
+r[macro.ambiguity.sets.first]
   * FIRST(M): collects the tokens potentially used first when matching a
     fragment to M.
 
+r[macro.ambiguity.sets.last]
   * LAST(M): collects the tokens potentially used last when matching a fragment
     to M.
 
+r[macro.ambiguity.sets.follow]
   * FOLLOW(M): the set of tokens allowed to follow immediately after some
     fragment matched by M.
 
@@ -163,6 +185,7 @@ Informally:
 
       * The concatenation α β γ δ is a parseable Rust program.
 
+r[macro.ambiguity.sets.universe]
 We use the shorthand ANYTOKEN to denote the set of all tokens (including simple
 NTs). For example, if any token is legal after a matcher M, then FOLLOW(M) =
 ANYTOKEN.
@@ -174,18 +197,27 @@ definitions.)
 
 ### FIRST, LAST
 
+r[macro.ambiguity.sets.def]
+
+r[macro.ambiguity.sets.def.intro]
 Below are formal inductive definitions for FIRST and LAST.
 
+r[macro.ambiguity.sets.def.notation]
 "A ∪ B" denotes set union, "A ∩ B" denotes set intersection, and "A \ B"
 denotes set difference (i.e. all elements of A that are not present in B).
 
 #### FIRST
 
+r[macro.ambiguity.sets.def.first]
+
+r[macro.ambiguity.sets.def.first.intro]
 FIRST(M) is defined by case analysis on the sequence M and the structure of its
 first token-tree (if any):
 
+r[macro.ambiguity.sets.def.first.epsilon]
   * if M is the empty sequence, then FIRST(M) = { ε },
 
+r[macro.ambiguity.sets.def.first.token]
   * if M starts with a token t, then FIRST(M) = { t },
 
     (Note: this covers the case where M starts with a delimited token-tree
@@ -195,6 +227,7 @@ first token-tree (if any):
     (Note: this critically relies on the property that no simple NT matches the
     empty fragment.)
 
+r[macro.ambiguity.sets.def.first.complex]
   * Otherwise, M is a token-tree sequence starting with a complex NT: `M = $( tt
     ... ) OP α`, or `M = $( tt ... ) SEP OP α`, (where `α` is the (potentially
     empty) sequence of token trees for the rest of the matcher).
@@ -229,12 +262,18 @@ with respect to \varepsilon as well.
 
 #### LAST
 
+r[macro.ambiguity.sets.def.last]
+
+r[macro.ambiguity.sets.def.last.intro]
 LAST(M), defined by case analysis on M itself (a sequence of token-trees):
 
+r[macro.ambiguity.sets.def.last.empty]
   * if M is the empty sequence, then LAST(M) = { ε }
 
+r[macro.ambiguity.sets.def.last.token]
   * if M is a singleton token t, then LAST(M) = { t }
 
+r[macro.ambiguity.sets.def.last.rep-star]
   * if M is the singleton complex NT repeating zero or more times, `M = $( tt
     ... ) *`, or `M = $( tt ... ) SEP *`
 
@@ -245,6 +284,7 @@ LAST(M), defined by case analysis on M itself (a sequence of token-trees):
       * otherwise, the sequence `tt ...` must be non-empty; LAST(M) = LAST(`tt
         ...`) ∪ {ε}.
 
+r[macro.ambiguity.sets.def.last.rep-plus]
   * if M is the singleton complex NT repeating one or more times, `M = $( tt ...
     ) +`, or `M = $( tt ... ) SEP +`
 
@@ -255,12 +295,15 @@ LAST(M), defined by case analysis on M itself (a sequence of token-trees):
       * otherwise, the sequence `tt ...` must be non-empty; LAST(M) = LAST(`tt
         ...`)
 
+r[macro.ambiguity.sets.def.last.rep-question]
   * if M is the singleton complex NT repeating zero or one time, `M = $( tt ...)
     ?`, then LAST(M) = LAST(`tt ...`) ∪ {ε}.
 
+r[macro.ambiguity.sets.def.last.delim]
   * if M is a delimited token-tree sequence `OPEN tt ... CLOSE`, then LAST(M) =
     { `CLOSE` }.
 
+r[macro.ambiguity.sets.def.last.sequence]
   * if M is a non-empty sequence of token-trees `tt uu ...`,
 
       * If ε ∈ LAST(`uu ...`), then LAST(M) = LAST(`tt`) ∪ (LAST(`uu ...`) \ { ε }).
@@ -320,25 +363,35 @@ Here are similar examples but now for LAST.
 
 ### FOLLOW(M)
 
+r[macro.ambiguity.sets.def.follow]
+
+r[macro.ambiguity.sets.def.follow.intro]
 Finally, the definition for FOLLOW(M) is built up as follows. pat, expr, etc.
 represent simple nonterminals with the given fragment specifier.
 
+r[macro.ambiguity.sets.def.follow.pat]
   * FOLLOW(pat) = {`=>`, `,`, `=`, `|`, `if`, `in`}`.
 
+r[macro.ambiguity.sets.def.follow.expr-stmt]
   * FOLLOW(expr) = FOLLOW(expr_2021) = FOLLOW(stmt) =  {`=>`, `,`, `;`}`.
 
+r[macro.ambiguity.sets.def.follow.ty-path]
   * FOLLOW(ty) = FOLLOW(path) = {`{`, `[`, `,`, `=>`, `:`, `=`, `>`, `>>`, `;`,
     `|`, `as`, `where`, block nonterminals}.
 
+r[macro.ambiguity.sets.def.follow.vis]
   * FOLLOW(vis) = {`,`l any keyword or identifier except a non-raw `priv`; any
     token that can begin a type; ident, ty, and path nonterminals}.
 
+r[macro.ambiguity.sets.def.follow.simple]
   * FOLLOW(t) = ANYTOKEN for any other simple token, including block, ident,
     tt, item, lifetime, literal and meta simple nonterminals, and all terminals.
 
+r[macro.ambiguity.sets.def.follow.other-matcher]
   * FOLLOW(M), for any other M, is defined as the intersection, as t ranges over
     (LAST(M) \ {ε}), of FOLLOW(t).
 
+r[macro.ambiguity.sets.def.follow.type-first]
 The tokens that can begin a type are, as of this writing, {`(`, `[`, `!`, `*`,
 `&`, `&&`, `?`, lifetimes, `>`, `>>`, `::`, any non-keyword identifier, `super`,
 `self`, `Self`, `extern`, `crate`, `$crate`, `_`, `for`, `impl`, `fn`, `unsafe`,