From f3aff9f8de13389ef7090a880219f28195257bb8 Mon Sep 17 00:00:00 2001 From: Bottersnike Date: Sun, 29 Jun 2025 17:20:12 +0100 Subject: [PATCH 1/3] RFC: Indexing syntax when using long strings --- docs/indexing-syntax-long-strings.md | 49 ++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) create mode 100644 docs/indexing-syntax-long-strings.md diff --git a/docs/indexing-syntax-long-strings.md b/docs/indexing-syntax-long-strings.md new file mode 100644 index 00000000..ca98bace --- /dev/null +++ b/docs/indexing-syntax-long-strings.md @@ -0,0 +1,49 @@ +# Indexing syntax when using long strings + +## Summary +This RFC defines the parsing logic for `foo[[[a]]]` and `{[[[a]]]=b}`, making this both a parsable statement and removing currently semantic whitespace. + +## Motivation +Luau supports multiline strings by means of long brackets. These take the form `[[foo]]`, `[=[foo]=]` and so forth. These are typically consumed during lexical analysis of implementations, before parsing has begun. + +Luau additionally supports indexing, using square brackets. This can be used as part of an expression (`foo[a]`) or as part of a table literal (`{[a] = foo}`). This is identified during parsing, after lexical analysis has identified a single `[` opening token. + +These two rules become conflicting when attempting to parse a statement of the form `foo[[[a]]]`. No strict behaviour is defined for how this should be parsed, however current implementations consume the first `[[` during lexical analysis, producing a string of `[a]]` followed by a single `]` token. This then fails to parse. + +By rewriting this expression as `foo[ [[a]]]` we can ensure the lexical analysis identifies a single `[` followed by a string and then a closing `]`. This whitespace is therefore semantic. + +This behaviour is unexpected, as there is a valid way this expression could have been parsed that was not used. + +This is currently one of two semantic whitespace occurrences in the language. The other occurrence occurs when using a comment after a hanging minus symbol, as demonstrated in the following example: + +```lua +local foo = 5 - -- Foo! + 1 +``` + +This RFC does not address this second case, as the solution to this case is far less trivial. + +### Existing behaviour across tooling +The predominant tool used for minification of Luau code at the time of writing, [darklua](https://github.com/seaofvoices/darklua), does not fall into the trap of minifying `foo[ [[a]]]` into `foo[[[a]]]`. It instead minifies to `foo[ [[a]] ]` or `foo['a']` depending on the configuration. Darklua would be able to remove those spaces after this change. + +The first party Luau parser follows the semantic-whitespace behaviour. + +The predominant Rust parser for Luau, [full-moon](https://github.com/Kampfkarren/full-moon), follows the semantic-whitespace behaviour. + +## Design +A sequence of three opening square brackets (`[[[`) must be parsed as a single opening bracket followed by an opening long bracket (`'[' '[['`). + +A sequence of two opening square brackets, followed by an equals symbol (`[[=`) must be parsed as a single opening bracket followed by the beginning of an opening long bracket (`'[' '[==...=[')`). + +## Drawbacks +All current Luau parser implementations do not follow this behaviour. They instead follow the behaviour outlined in the Motivation section. + +Under the current behaviour `foo[bar[[[a]]]` parses as `foo[bar"[a"]`, however this proposed behaviour would parse that as `foo[bar["a"]` with an unterminated bracket pair. This is not backwards compatible, however the author of this RFC believes this to be a more expected behaviour, and could not find evidence of this problematic pattern being used in the wild. + +## Alternatives +There are two alternatives: + +1. Do absolutely nothing. This is undesirable, as this behaviour remains ambiguous +2. Retain the current behaviour, defining it explicitly with another RFC + +Of these, (2.) would be preferable if the proposed behaviour in this RFC is rejected. From a7a54d6806e533e9192b8ef3666ccb55797032c0 Mon Sep 17 00:00:00 2001 From: Bottersnike Date: Sun, 29 Jun 2025 17:31:32 +0100 Subject: [PATCH 2/3] Update with additional drawback --- docs/indexing-syntax-long-strings.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/indexing-syntax-long-strings.md b/docs/indexing-syntax-long-strings.md index ca98bace..7dc8f21a 100644 --- a/docs/indexing-syntax-long-strings.md +++ b/docs/indexing-syntax-long-strings.md @@ -40,6 +40,8 @@ All current Luau parser implementations do not follow this behaviour. They inste Under the current behaviour `foo[bar[[[a]]]` parses as `foo[bar"[a"]`, however this proposed behaviour would parse that as `foo[bar["a"]` with an unterminated bracket pair. This is not backwards compatible, however the author of this RFC believes this to be a more expected behaviour, and could not find evidence of this problematic pattern being used in the wild. +Under the current behaviour `print([[[[a]])` parses as `print("[[a")`. The change proposed here would cause this to parse as `print(["[a")` which is now invalid code. A single use of this was found on GitHub in a Lua file used for a custom nvim init. + ## Alternatives There are two alternatives: From afc51d643d166d71538f6519d4ee8ea29b9499f1 Mon Sep 17 00:00:00 2001 From: Bottersnike Date: Sun, 29 Jun 2025 17:42:50 +0100 Subject: [PATCH 3/3] Add third alternative --- docs/indexing-syntax-long-strings.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/docs/indexing-syntax-long-strings.md b/docs/indexing-syntax-long-strings.md index 7dc8f21a..a213d46e 100644 --- a/docs/indexing-syntax-long-strings.md +++ b/docs/indexing-syntax-long-strings.md @@ -43,9 +43,12 @@ Under the current behaviour `foo[bar[[[a]]]` parses as `foo[bar"[a"]`, however t Under the current behaviour `print([[[[a]])` parses as `print("[[a")`. The change proposed here would cause this to parse as `print(["[a")` which is now invalid code. A single use of this was found on GitHub in a Lua file used for a custom nvim init. ## Alternatives -There are two alternatives: +There are three alternatives: 1. Do absolutely nothing. This is undesirable, as this behaviour remains ambiguous 2. Retain the current behaviour, defining it explicitly with another RFC +3. Define `[[[` and `[[[=...=[` as syntax errors. -Of these, (2.) would be preferable if the proposed behaviour in this RFC is rejected. +(3.) may be the "cleanest" of all the solutions here as it resolves the question of how this should be based, however it still retains the semantic whitespace behaviour. + +(1.) is heavily undesirable, and (2.) would be preferable instead.