Skip to content

Commit 889a660

Browse files
authored
Merge pull request #1634 from chorman0773/spec-add-identifiers-macro-ambiguity
Add identifier syntax to macro-ambiguity.md
2 parents 206755b + 806170f commit 889a660

File tree

1 file changed

+53
-0
lines changed

1 file changed

+53
-0
lines changed

src/macro-ambiguity.md

+53
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,16 @@
11
# Appendix: Macro Follow-Set Ambiguity Formal Specification
22

3+
r[macro.ambiguity]
4+
35
This page documents the formal specification of the follow rules for [Macros
46
By Example]. They were originally specified in [RFC 550], from which the bulk
57
of this text is copied, and expanded upon in subsequent RFCs.
68

79
## Definitions & Conventions
810

11+
r[macro.ambiguity.convention]
12+
13+
r[macro.ambiguity.convention.defs]
914
- `macro`: anything invokable as `foo!(...)` in source code.
1015
- `MBE`: macro-by-example, a macro defined by `macro_rules`.
1116
- `matcher`: the left-hand-side of a rule in a `macro_rules` invocation, or a
@@ -46,11 +51,13 @@ macro_rules! i_am_an_mbe {
4651
}
4752
```
4853

54+
r[macro.ambiguity.convention.matcher]
4955
`(start $foo:expr $($i:ident),* end)` is a matcher. The whole matcher is a
5056
delimited sequence (with open- and close-delimiters `(` and `)`), and `$foo`
5157
and `$i` are simple NT's with `expr` and `ident` as their respective fragment
5258
specifiers.
5359

60+
r[macro.ambiguity.convention.complex-nt]
5461
`$(i:ident),*` is *also* an NT; it is a complex NT that matches a
5562
comma-separated repetition of identifiers. The `,` is the separator token for
5663
the complex NT; it occurs in between each pair of elements (if any) of the
@@ -65,16 +72,19 @@ token.
6572
proper nesting of token tree structure and correct matching of open- and
6673
close-delimiters.)
6774

75+
r[macro.ambiguity.convention.vars]
6876
We will tend to use the variable "M" to stand for a matcher, variables "t" and
6977
"u" for arbitrary individual tokens, and the variables "tt" and "uu" for
7078
arbitrary token trees. (The use of "tt" does present potential ambiguity with
7179
its additional role as a fragment specifier; but it will be clear from context
7280
which interpretation is meant.)
7381

82+
r[macro.ambiguity.convention.set]
7483
"SEP" will range over separator tokens, "OP" over the repetition operators
7584
`*`, `+`, and `?`, "OPEN"/"CLOSE" over matching token pairs surrounding a
7685
delimited sequence (e.g. `[` and `]`).
7786

87+
r[macro.ambiguity.convention.sequence-vars]
7888
Greek letters "α" "β" "γ" "δ" stand for potentially empty token-tree sequences.
7989
(However, the Greek letter "ε" (epsilon) has a special role in the presentation
8090
and does not stand for a token-tree sequence.)
@@ -101,6 +111,9 @@ purposes of the formalism, we will treat `$v:vis` as actually being
101111

102112
### The Matcher Invariants
103113

114+
r[macro.ambiguity.invariant]
115+
116+
r[macro.ambiguity.invariant.list]
104117
To be valid, a matcher must meet the following three invariants. The definitions
105118
of FIRST and FOLLOW are described later.
106119

@@ -112,18 +125,21 @@ of FIRST and FOLLOW are described later.
112125
1. For an unseparated complex NT in a matcher, `M = ... $(tt ...) OP ...`, if
113126
OP = `*` or `+`, we must have FOLLOW(`tt ...`) ⊇ FIRST(`tt ...`).
114127

128+
r[macro.ambiguity.invariant.follow-matcher]
115129
The first invariant says that whatever actual token that comes after a matcher,
116130
if any, must be somewhere in the predetermined follow set. This ensures that a
117131
legal macro definition will continue to assign the same determination as to
118132
where `... tt` ends and `uu ...` begins, even as new syntactic forms are added
119133
to the language.
120134

135+
r[macro.ambiguity.invariant.separated-complex-nt]
121136
The second invariant says that a separated complex NT must use a separator token
122137
that is part of the predetermined follow set for the internal contents of the
123138
NT. This ensures that a legal macro definition will continue to parse an input
124139
fragment into the same delimited sequence of `tt ...`'s, even as new syntactic
125140
forms are added to the language.
126141

142+
r[macro.ambiguity.invariant.unseparated-complex-nt]
127143
The third invariant says that when we have a complex NT that can match two or
128144
more copies of the same thing with no separation in between, it must be
129145
permissible for them to be placed next to each other as per the first invariant.
@@ -137,6 +153,9 @@ invalid in a future edition of Rust. See the [tracking issue].**
137153

138154
### FIRST and FOLLOW, informally
139155

156+
r[macro.ambiguity.sets]
157+
158+
r[macro.ambiguity.sets.intro]
140159
A given matcher M maps to three sets: FIRST(M), LAST(M) and FOLLOW(M).
141160

142161
Each of the three sets is made up of tokens. FIRST(M) and LAST(M) may also
@@ -145,12 +164,15 @@ can match the empty fragment. (But FOLLOW(M) is always just a set of tokens.)
145164

146165
Informally:
147166

167+
r[macro.ambiguity.sets.first]
148168
* FIRST(M): collects the tokens potentially used first when matching a
149169
fragment to M.
150170

171+
r[macro.ambiguity.sets.last]
151172
* LAST(M): collects the tokens potentially used last when matching a fragment
152173
to M.
153174

175+
r[macro.ambiguity.sets.follow]
154176
* FOLLOW(M): the set of tokens allowed to follow immediately after some
155177
fragment matched by M.
156178

@@ -163,6 +185,7 @@ Informally:
163185

164186
* The concatenation α β γ δ is a parseable Rust program.
165187

188+
r[macro.ambiguity.sets.universe]
166189
We use the shorthand ANYTOKEN to denote the set of all tokens (including simple
167190
NTs). For example, if any token is legal after a matcher M, then FOLLOW(M) =
168191
ANYTOKEN.
@@ -174,18 +197,27 @@ definitions.)
174197

175198
### FIRST, LAST
176199

200+
r[macro.ambiguity.sets.def]
201+
202+
r[macro.ambiguity.sets.def.intro]
177203
Below are formal inductive definitions for FIRST and LAST.
178204

205+
r[macro.ambiguity.sets.def.notation]
179206
"A ∪ B" denotes set union, "A ∩ B" denotes set intersection, and "A \ B"
180207
denotes set difference (i.e. all elements of A that are not present in B).
181208

182209
#### FIRST
183210

211+
r[macro.ambiguity.sets.def.first]
212+
213+
r[macro.ambiguity.sets.def.first.intro]
184214
FIRST(M) is defined by case analysis on the sequence M and the structure of its
185215
first token-tree (if any):
186216

217+
r[macro.ambiguity.sets.def.first.epsilon]
187218
* if M is the empty sequence, then FIRST(M) = { ε },
188219

220+
r[macro.ambiguity.sets.def.first.token]
189221
* if M starts with a token t, then FIRST(M) = { t },
190222

191223
(Note: this covers the case where M starts with a delimited token-tree
@@ -195,6 +227,7 @@ first token-tree (if any):
195227
(Note: this critically relies on the property that no simple NT matches the
196228
empty fragment.)
197229

230+
r[macro.ambiguity.sets.def.first.complex]
198231
* Otherwise, M is a token-tree sequence starting with a complex NT: `M = $( tt
199232
... ) OP α`, or `M = $( tt ... ) SEP OP α`, (where `α` is the (potentially
200233
empty) sequence of token trees for the rest of the matcher).
@@ -229,12 +262,18 @@ with respect to \varepsilon as well.
229262

230263
#### LAST
231264

265+
r[macro.ambiguity.sets.def.last]
266+
267+
r[macro.ambiguity.sets.def.last.intro]
232268
LAST(M), defined by case analysis on M itself (a sequence of token-trees):
233269

270+
r[macro.ambiguity.sets.def.last.empty]
234271
* if M is the empty sequence, then LAST(M) = { ε }
235272

273+
r[macro.ambiguity.sets.def.last.token]
236274
* if M is a singleton token t, then LAST(M) = { t }
237275

276+
r[macro.ambiguity.sets.def.last.rep-star]
238277
* if M is the singleton complex NT repeating zero or more times, `M = $( tt
239278
... ) *`, or `M = $( tt ... ) SEP *`
240279

@@ -245,6 +284,7 @@ LAST(M), defined by case analysis on M itself (a sequence of token-trees):
245284
* otherwise, the sequence `tt ...` must be non-empty; LAST(M) = LAST(`tt
246285
...`) ∪ {ε}.
247286

287+
r[macro.ambiguity.sets.def.last.rep-plus]
248288
* if M is the singleton complex NT repeating one or more times, `M = $( tt ...
249289
) +`, or `M = $( tt ... ) SEP +`
250290

@@ -255,12 +295,15 @@ LAST(M), defined by case analysis on M itself (a sequence of token-trees):
255295
* otherwise, the sequence `tt ...` must be non-empty; LAST(M) = LAST(`tt
256296
...`)
257297

298+
r[macro.ambiguity.sets.def.last.rep-question]
258299
* if M is the singleton complex NT repeating zero or one time, `M = $( tt ...)
259300
?`, then LAST(M) = LAST(`tt ...`) ∪ {ε}.
260301

302+
r[macro.ambiguity.sets.def.last.delim]
261303
* if M is a delimited token-tree sequence `OPEN tt ... CLOSE`, then LAST(M) =
262304
{ `CLOSE` }.
263305

306+
r[macro.ambiguity.sets.def.last.sequence]
264307
* if M is a non-empty sequence of token-trees `tt uu ...`,
265308

266309
* If ε ∈ LAST(`uu ...`), then LAST(M) = LAST(`tt`) ∪ (LAST(`uu ...`) \ { ε }).
@@ -320,25 +363,35 @@ Here are similar examples but now for LAST.
320363

321364
### FOLLOW(M)
322365

366+
r[macro.ambiguity.sets.def.follow]
367+
368+
r[macro.ambiguity.sets.def.follow.intro]
323369
Finally, the definition for FOLLOW(M) is built up as follows. pat, expr, etc.
324370
represent simple nonterminals with the given fragment specifier.
325371

372+
r[macro.ambiguity.sets.def.follow.pat]
326373
* FOLLOW(pat) = {`=>`, `,`, `=`, `|`, `if`, `in`}`.
327374

375+
r[macro.ambiguity.sets.def.follow.expr-stmt]
328376
* FOLLOW(expr) = FOLLOW(expr_2021) = FOLLOW(stmt) = {`=>`, `,`, `;`}`.
329377

378+
r[macro.ambiguity.sets.def.follow.ty-path]
330379
* FOLLOW(ty) = FOLLOW(path) = {`{`, `[`, `,`, `=>`, `:`, `=`, `>`, `>>`, `;`,
331380
`|`, `as`, `where`, block nonterminals}.
332381

382+
r[macro.ambiguity.sets.def.follow.vis]
333383
* FOLLOW(vis) = {`,`l any keyword or identifier except a non-raw `priv`; any
334384
token that can begin a type; ident, ty, and path nonterminals}.
335385

386+
r[macro.ambiguity.sets.def.follow.simple]
336387
* FOLLOW(t) = ANYTOKEN for any other simple token, including block, ident,
337388
tt, item, lifetime, literal and meta simple nonterminals, and all terminals.
338389

390+
r[macro.ambiguity.sets.def.follow.other-matcher]
339391
* FOLLOW(M), for any other M, is defined as the intersection, as t ranges over
340392
(LAST(M) \ {ε}), of FOLLOW(t).
341393

394+
r[macro.ambiguity.sets.def.follow.type-first]
342395
The tokens that can begin a type are, as of this writing, {`(`, `[`, `!`, `*`,
343396
`&`, `&&`, `?`, lifetimes, `>`, `>>`, `::`, any non-keyword identifier, `super`,
344397
`self`, `Self`, `extern`, `crate`, `$crate`, `_`, `for`, `impl`, `fn`, `unsafe`,

0 commit comments

Comments
 (0)