1
1
# Appendix: Macro Follow-Set Ambiguity Formal Specification
2
2
3
+ r[ macro.ambiguity]
4
+
3
5
This page documents the formal specification of the follow rules for [ Macros
4
6
By Example] . They were originally specified in [ RFC 550] , from which the bulk
5
7
of this text is copied, and expanded upon in subsequent RFCs.
6
8
7
9
## Definitions & Conventions
8
10
11
+ r[ macro.ambiguity.convention]
12
+
13
+ r[ macro.ambiguity.convention.defs]
9
14
- ` macro ` : anything invokable as ` foo!(...) ` in source code.
10
15
- ` MBE ` : macro-by-example, a macro defined by ` macro_rules ` .
11
16
- ` matcher ` : the left-hand-side of a rule in a ` macro_rules ` invocation, or a
@@ -46,11 +51,13 @@ macro_rules! i_am_an_mbe {
46
51
}
47
52
```
48
53
54
+ r[ macro.ambiguity.convention.matcher]
49
55
` (start $foo:expr $($i:ident),* end) ` is a matcher. The whole matcher is a
50
56
delimited sequence (with open- and close-delimiters ` ( ` and ` ) ` ), and ` $foo `
51
57
and ` $i ` are simple NT's with ` expr ` and ` ident ` as their respective fragment
52
58
specifiers.
53
59
60
+ r[ macro.ambiguity.convention.complex-nt]
54
61
` $(i:ident),* ` is * also* an NT; it is a complex NT that matches a
55
62
comma-separated repetition of identifiers. The ` , ` is the separator token for
56
63
the complex NT; it occurs in between each pair of elements (if any) of the
@@ -65,16 +72,19 @@ token.
65
72
proper nesting of token tree structure and correct matching of open- and
66
73
close-delimiters.)
67
74
75
+ r[ macro.ambiguity.convention.vars]
68
76
We will tend to use the variable "M" to stand for a matcher, variables "t" and
69
77
"u" for arbitrary individual tokens, and the variables "tt" and "uu" for
70
78
arbitrary token trees. (The use of "tt" does present potential ambiguity with
71
79
its additional role as a fragment specifier; but it will be clear from context
72
80
which interpretation is meant.)
73
81
82
+ r[ macro.ambiguity.convention.set]
74
83
"SEP" will range over separator tokens, "OP" over the repetition operators
75
84
` * ` , ` + ` , and ` ? ` , "OPEN"/"CLOSE" over matching token pairs surrounding a
76
85
delimited sequence (e.g. ` [ ` and ` ] ` ).
77
86
87
+ r[ macro.ambiguity.convention.sequence-vars]
78
88
Greek letters "α" "β" "γ" "δ" stand for potentially empty token-tree sequences.
79
89
(However, the Greek letter "ε" (epsilon) has a special role in the presentation
80
90
and does not stand for a token-tree sequence.)
@@ -101,6 +111,9 @@ purposes of the formalism, we will treat `$v:vis` as actually being
101
111
102
112
### The Matcher Invariants
103
113
114
+ r[ macro.ambiguity.invariant]
115
+
116
+ r[ macro.ambiguity.invariant.list]
104
117
To be valid, a matcher must meet the following three invariants. The definitions
105
118
of FIRST and FOLLOW are described later.
106
119
@@ -112,18 +125,21 @@ of FIRST and FOLLOW are described later.
112
125
1 . For an unseparated complex NT in a matcher, ` M = ... $(tt ...) OP ... ` , if
113
126
OP = ` * ` or ` + ` , we must have FOLLOW(` tt ... ` ) ⊇ FIRST(` tt ... ` ).
114
127
128
+ r[ macro.ambiguity.invariant.follow-matcher]
115
129
The first invariant says that whatever actual token that comes after a matcher,
116
130
if any, must be somewhere in the predetermined follow set. This ensures that a
117
131
legal macro definition will continue to assign the same determination as to
118
132
where ` ... tt ` ends and ` uu ... ` begins, even as new syntactic forms are added
119
133
to the language.
120
134
135
+ r[ macro.ambiguity.invariant.separated-complex-nt]
121
136
The second invariant says that a separated complex NT must use a separator token
122
137
that is part of the predetermined follow set for the internal contents of the
123
138
NT. This ensures that a legal macro definition will continue to parse an input
124
139
fragment into the same delimited sequence of ` tt ... ` 's, even as new syntactic
125
140
forms are added to the language.
126
141
142
+ r[ macro.ambiguity.invariant.unseparated-complex-nt]
127
143
The third invariant says that when we have a complex NT that can match two or
128
144
more copies of the same thing with no separation in between, it must be
129
145
permissible for them to be placed next to each other as per the first invariant.
@@ -137,6 +153,9 @@ invalid in a future edition of Rust. See the [tracking issue].**
137
153
138
154
### FIRST and FOLLOW, informally
139
155
156
+ r[ macro.ambiguity.sets]
157
+
158
+ r[ macro.ambiguity.sets.intro]
140
159
A given matcher M maps to three sets: FIRST(M), LAST(M) and FOLLOW(M).
141
160
142
161
Each of the three sets is made up of tokens. FIRST(M) and LAST(M) may also
@@ -145,12 +164,15 @@ can match the empty fragment. (But FOLLOW(M) is always just a set of tokens.)
145
164
146
165
Informally:
147
166
167
+ r[ macro.ambiguity.sets.first]
148
168
* FIRST(M): collects the tokens potentially used first when matching a
149
169
fragment to M.
150
170
171
+ r[ macro.ambiguity.sets.last]
151
172
* LAST(M): collects the tokens potentially used last when matching a fragment
152
173
to M.
153
174
175
+ r[ macro.ambiguity.sets.follow]
154
176
* FOLLOW(M): the set of tokens allowed to follow immediately after some
155
177
fragment matched by M.
156
178
@@ -163,6 +185,7 @@ Informally:
163
185
164
186
* The concatenation α β γ δ is a parseable Rust program.
165
187
188
+ r[ macro.ambiguity.sets.universe]
166
189
We use the shorthand ANYTOKEN to denote the set of all tokens (including simple
167
190
NTs). For example, if any token is legal after a matcher M, then FOLLOW(M) =
168
191
ANYTOKEN.
@@ -174,18 +197,27 @@ definitions.)
174
197
175
198
### FIRST, LAST
176
199
200
+ r[ macro.ambiguity.sets.def]
201
+
202
+ r[ macro.ambiguity.sets.def.intro]
177
203
Below are formal inductive definitions for FIRST and LAST.
178
204
205
+ r[ macro.ambiguity.sets.def.notation]
179
206
"A ∪ B" denotes set union, "A ∩ B" denotes set intersection, and "A \ B"
180
207
denotes set difference (i.e. all elements of A that are not present in B).
181
208
182
209
#### FIRST
183
210
211
+ r[ macro.ambiguity.sets.def.first]
212
+
213
+ r[ macro.ambiguity.sets.def.first.intro]
184
214
FIRST(M) is defined by case analysis on the sequence M and the structure of its
185
215
first token-tree (if any):
186
216
217
+ r[ macro.ambiguity.sets.def.first.epsilon]
187
218
* if M is the empty sequence, then FIRST(M) = { ε },
188
219
220
+ r[ macro.ambiguity.sets.def.first.token]
189
221
* if M starts with a token t, then FIRST(M) = { t },
190
222
191
223
(Note: this covers the case where M starts with a delimited token-tree
@@ -195,6 +227,7 @@ first token-tree (if any):
195
227
(Note: this critically relies on the property that no simple NT matches the
196
228
empty fragment.)
197
229
230
+ r[ macro.ambiguity.sets.def.first.complex]
198
231
* Otherwise, M is a token-tree sequence starting with a complex NT: `M = $( tt
199
232
... ) OP α` , or ` M = $( tt ... ) SEP OP α` , (where ` α` is the (potentially
200
233
empty) sequence of token trees for the rest of the matcher).
@@ -229,12 +262,18 @@ with respect to \varepsilon as well.
229
262
230
263
#### LAST
231
264
265
+ r[ macro.ambiguity.sets.def.last]
266
+
267
+ r[ macro.ambiguity.sets.def.last.intro]
232
268
LAST(M), defined by case analysis on M itself (a sequence of token-trees):
233
269
270
+ r[ macro.ambiguity.sets.def.last.empty]
234
271
* if M is the empty sequence, then LAST(M) = { ε }
235
272
273
+ r[ macro.ambiguity.sets.def.last.token]
236
274
* if M is a singleton token t, then LAST(M) = { t }
237
275
276
+ r[ macro.ambiguity.sets.def.last.rep-star]
238
277
* if M is the singleton complex NT repeating zero or more times, `M = $( tt
239
278
... ) * ` , or ` M = $( tt ... ) SEP * `
240
279
@@ -245,6 +284,7 @@ LAST(M), defined by case analysis on M itself (a sequence of token-trees):
245
284
* otherwise, the sequence `tt ...` must be non-empty; LAST(M) = LAST(`tt
246
285
...`) ∪ {ε}.
247
286
287
+ r[ macro.ambiguity.sets.def.last.rep-plus]
248
288
* if M is the singleton complex NT repeating one or more times, `M = $( tt ...
249
289
) +` , or ` M = $( tt ... ) SEP +`
250
290
@@ -255,12 +295,15 @@ LAST(M), defined by case analysis on M itself (a sequence of token-trees):
255
295
* otherwise, the sequence `tt ...` must be non-empty; LAST(M) = LAST(`tt
256
296
...`)
257
297
298
+ r[ macro.ambiguity.sets.def.last.rep-question]
258
299
* if M is the singleton complex NT repeating zero or one time, `M = $( tt ...)
259
300
?` , then LAST(M) = LAST( ` tt ...`) ∪ {ε}.
260
301
302
+ r[ macro.ambiguity.sets.def.last.delim]
261
303
* if M is a delimited token-tree sequence ` OPEN tt ... CLOSE ` , then LAST(M) =
262
304
{ ` CLOSE ` }.
263
305
306
+ r[ macro.ambiguity.sets.def.last.sequence]
264
307
* if M is a non-empty sequence of token-trees ` tt uu ... ` ,
265
308
266
309
* If ε ∈ LAST(`uu ...`), then LAST(M) = LAST(`tt`) ∪ (LAST(`uu ...`) \ { ε }).
@@ -320,25 +363,35 @@ Here are similar examples but now for LAST.
320
363
321
364
### FOLLOW(M)
322
365
366
+ r[ macro.ambiguity.sets.def.follow]
367
+
368
+ r[ macro.ambiguity.sets.def.follow.intro]
323
369
Finally, the definition for FOLLOW(M) is built up as follows. pat, expr, etc.
324
370
represent simple nonterminals with the given fragment specifier.
325
371
372
+ r[ macro.ambiguity.sets.def.follow.pat]
326
373
* FOLLOW(pat) = {` => ` , ` , ` , ` = ` , ` | ` , ` if ` , ` in ` }`.
327
374
375
+ r[ macro.ambiguity.sets.def.follow.expr-stmt]
328
376
* FOLLOW(expr) = FOLLOW(expr_2021) = FOLLOW(stmt) = {` => ` , ` , ` , ` ; ` }`.
329
377
378
+ r[ macro.ambiguity.sets.def.follow.ty-path]
330
379
* FOLLOW(ty) = FOLLOW(path) = {` { ` , ` [ ` , ` , ` , ` => ` , ` : ` , ` = ` , ` > ` , ` >> ` , ` ; ` ,
331
380
` | ` , ` as ` , ` where ` , block nonterminals}.
332
381
382
+ r[ macro.ambiguity.sets.def.follow.vis]
333
383
* FOLLOW(vis) = {` , ` l any keyword or identifier except a non-raw ` priv ` ; any
334
384
token that can begin a type; ident, ty, and path nonterminals}.
335
385
386
+ r[ macro.ambiguity.sets.def.follow.simple]
336
387
* FOLLOW(t) = ANYTOKEN for any other simple token, including block, ident,
337
388
tt, item, lifetime, literal and meta simple nonterminals, and all terminals.
338
389
390
+ r[ macro.ambiguity.sets.def.follow.other-matcher]
339
391
* FOLLOW(M), for any other M, is defined as the intersection, as t ranges over
340
392
(LAST(M) \ {ε}), of FOLLOW(t).
341
393
394
+ r[ macro.ambiguity.sets.def.follow.type-first]
342
395
The tokens that can begin a type are, as of this writing, {` ( ` , ` [ ` , ` ! ` , ` * ` ,
343
396
` & ` , ` && ` , ` ? ` , lifetimes, ` > ` , ` >> ` , ` :: ` , any non-keyword identifier, ` super ` ,
344
397
` self ` , ` Self ` , ` extern ` , ` crate ` , ` $crate ` , ` _ ` , ` for ` , ` impl ` , ` fn ` , ` unsafe ` ,
0 commit comments