Skip to content

Fix precedence of '..' range notation and some grammar inconsistencies. #20958

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

dgrunwald
Copy link
Contributor

Fixes #20811 and #20241.

  • .. precedence changed to between comparison operators and bit or; using the formal grammar:
BitOrExpr = BitXorExpr | BitOrExpr "^" BitXorExpr
RangeExpr = BitOrExpr | ".." BitOrExpr | BitOrExpr ".." | BitOrExpr ".." BitOrExpr
CompExpr = RangeExpr | RangeExpr ("==" | "!=" | "<" | ">" | "<=" | ">=") RangeExpr
AndExpr = CompExpr | AndExpr "&&" CompExpr
  • Where the above grammar is ambiguous due to collisions between binary operators and prefix operators (e.g. r == 1.. && condition), && and || are parsed as binary operators, other tokens are parsed as prefix operators.
  • for i in 1.. {} is parsed as for i in (1..) {}

Implementation note: this changes the semantics of parse_more_binops() to parse binops with precedence >= min_prec.
Previously, it would parse binops with precedence > min_prec.

Fixes rust-lang#20811 and rust-lang#20241.

Note: this changes the semantics of parse_more_binops() to parse binops with
precedence >= min_prec.
Previously, it would parse binops with precedence > min_prec.
false
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this ensures that [a]..[b] is still treated as ([a])..([b]) rather than ([a]..)[b], which is good. It's unlikely to break, but it might be worth putting a test in for that case anyway?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely think we want tests for a few such cases.

@nikomatsakis
Copy link
Contributor

Sorry for taking a while on this. I wanted to carve out some time to get my head into the right space.

@nikomatsakis
Copy link
Contributor

An immediate thought is that I would like to see some tests of the traditional ambiguities, e.g.:

let x = 3;
if ..x == ..x { ... }

Also

struct Foo { x: uint }
if ..Foo { x: 3 } == ..Foo { x: 3} { ... }

I expect the latter to fail parsing however.

@nikomatsakis
Copy link
Contributor

Why exactly do we want ..1..2 not to parse? Is it just that it's nonsensical, or is there another reason?

@@ -2856,7 +2856,7 @@ impl<'a> Parser<'a> {
}

// Otherwise, we use the unique pointer default.
let subexpression = self.parse_prefix_expr();
let subexpression = self.parse_prefix_expr(prefix_prec);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this means that box ..5 does not parse, which I consider a bug. I'm not really clear on why we change from min_prec to prefix_prec here.

@nikomatsakis
Copy link
Contributor

OK, I left my initial round of comments on dgrunwald@a128662. By and large this seems very nice; I much prefer the min_prec thing to the restrictions. There are a few nits and points of clarification that I left in the comments.

I'd like to ponder the ..&& thing a bit more, but right now I feel sort of dubious about adding a special case just for that. It feels anomalous and I fear that will make the grammar harder to specify down the line.

@nikomatsakis
Copy link
Contributor

cc @pnkfelix and @cmr both of whom I think care about the grammar more than most ;)

@dgrunwald
Copy link
Contributor Author

On box ..5, and the distinction between 1..&x and `1..&&x': the general idea here was to allow only syntax that is compliant with the operator precedence table. That is, an operator with precedence N may not appear as operand of another operator with precedence greater than N without using parentheses.

box 1.. parses as (box 1)..; so it feels weird to allow box ..5 as box (..5). Allowing the latter syntax might give the wrong impression that .. has a higher operator precedence than box.

With &&, the situation is less clear -- whether the operator precedence of && is higher or lower than .. depends on whether we consider && a binary operator or unary operator. My main motivation here was to keep the symmetry between true && r == 1.. and r == 1.. && true.

@nikomatsakis
Copy link
Contributor

On Thu, Jan 15, 2015 at 11:01:39AM -0800, Daniel Grunwald wrote:

On box ..5, and the distinction between 1..&x and `1..&&x': the
general idea here was to allow only syntax that is compliant with
the operator precedence table. That is, an operator with precedence
N may not appear as operand of another operator with precedence
greater than N without using parentheses.

I think the root of my concern is that I think that the precedence of
box is wrong, which is really separate from this issue. I opened
#21192.

box 1.. parses as (box 1)..; so it feels weird to allow box ..5 as box (..5). Allowing the latter syntax might give the wrong
impression that .. has a higher operator precedence than box.

Even if we assume the first parse is correct (contra #21192),
regardless I don't understand why you would not expect .. as a unary
operator to behave differently. For example, box 3+5 currently
parses as (box 3)+5 (which I feel is a bug but whatever), however
box +3 parses fine.

I think what is confusing me a bit is that .. as a unary operator
has a distinct precedence from other unary operators.

With &&, the situation is less clear -- whether the operator
precedence of && is higher or lower than .. depends on whether
we consider && a binary operator or unary operator. My main
motivation here was to keep the symmetry between true && r == 1..
and r == 1.. && true.

I understand the motivation, but it seems to be achieved by
introducing a kind of arbitrary rule in the grammar, which I don't
think is worth it.

@dgrunwald
Copy link
Contributor Author

I think what is confusing me a bit is that .. as a unary operator has a distinct precedence from other unary operators.

There are two unary .. operators: prefix and postfix forms. In fact, .. is the only postfix operator in rust (not exactly true, strictly speaking function calls and indexing are also postfix operators).
I think giving postfix .. a different precedence than binary .. would be confusing (why would the grouping of expressions on the LHS of .. be affected by whether there's an expression on the RHS?).
At the same time, why should the two unary .. operators have different precedences?
IMHO the only sane solution is to give all three versions of .. the same precedence, and that means it must be distinct from the other unary operators.

I understand the motivation, but it seems to be achieved by introducing a kind of arbitrary rule in the grammar, which I don't think is worth it.

The grammar is ambiguous with this choice of operator precedence for ... Resolving the ambiguity either way requires an arbitrary rule.
Disambiguating the other way around would be less of a special case in the implementation, but I don't think we should base our decisions about the grammar based on what needs a little bit less code in a hand-written recursive-descent parser.

If anything, this is an argument that the choice of precedence for .. is wrong, and we should pick one that does not cause an ambiguity. (for example, keep it at the same precedence as the assignment operator)

Then again, we already have some other disambiguation decisions that are pretty specific to a recursive-descent parser implementation (for example, treating the < in i as i32 < 10 as opening a type argument list -- a bottom-up parser would do this differently). :-(

I should maybe separate this pull request into two PRs - one for fixing the grammar inconsistencies while keeping the current precedence (same as assignment operator), and another that increases the precedence to the level suggested here.

@dgrunwald
Copy link
Contributor Author

That is, an operator with precedence N may not appear as operand of another operator with precedence greater than N without using parentheses.

Note that I'd like to keep .. to this rule because it feels like it makes the grammar more consistent (I feel like i.. should be syntactically valid in the same contexts as ..i). However, this is not a hard requirement. In fact, Rust already has another language construct violating this rule: lambda expressions.

a + b * || c * d + e gets parsed as a + (b * (|| ((c * d) + e))). But lambda expressions are commonly parsed this way in other languages (C# is exactly the same), so I think violating the rule is OK for lambdas.

@nikomatsakis
Copy link
Contributor

I guess I don't particularly expect ..i and i.. to parse in precisely the same situations. I rather expect all (symbolic) unary operators to work roughly the same and likewise binary operators and so forth.

@nikomatsakis
Copy link
Contributor

The grammar is ambiguous with this choice of operator precedence for ... Resolving the ambiguity either way requires an arbitrary rule.

What I meant by arbitrary is that the rule kind of picks and chooses what expressions work and which don't, rather than just saying 'a .. XXX' always parses XXX as an expression if it can.

However, it occurs to me that there is a forwards compatibility risk we should be aware of here, similar to macros. If we added new unary operators it might cause a problem.

@dgrunwald
Copy link
Contributor Author

#21374 is an alternative to this PR. It fixes the same grammar inconsistencies, but keeps the operator precedence of .. on the same level as the assignment operator.
This has the advantage of not introducing the r == 1.. && true ambiguity as parentheses are always required when comparing ranges, so we can keep the grammar simple.

bors added a commit that referenced this pull request Jan 23, 2015
This PR is intended as alternative to #20958. It fixes the same grammar inconsistencies, but does not increase the operator precedence of `..`, leaving it at the same level as the assignment operator.
For previous discussion, see #20811 and #20958.

Grammar changes:
* allow `for _ in 1..i {}` (fixes #20241)
* allow `for _ in 1.. {}` as infinite loop
* prevent use of range notation in contexts where only operators of high precedence are expected (fixes #20811)

Parser code cleanup:
* remove `RESTRICTION_NO_DOTS`
* make `AS_PREC` const and follow naming convention
* make `min_prec` inclusive

r? nikomatsakis
@dgrunwald dgrunwald closed this Jan 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

range notation (a..b) has weird operator precedence
3 participants