-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Propose code string literals #3450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
it would be nice if there was a way to have the first line have indentation, e.g.: fn f() {
// something like this -- not sure the best way to indicate indentation that should be included
let s = ```
abc
```;
assert_eq!(s, " abc\n");
} |
How big of a problem is this? I am inclined to be against adding more stuff to the language for a problem we don't know we have. |
I've encountered this issue of wanting auto-formatting multi-line strings with embedded indentation multiple times myself, I would likely use ``` code strings a lot, maybe more often then I'd use the |
This design makes it impossible to indent the first line. |
|
cc @rust-lang/rustfmt and @rust-lang/style for awareness |
This seems rather over-indexed on python code. I'm all for preserving whitespace in this way, but why not do it like it is done in doc strings, and just trim as many characters as possible from the left margin (uniformly to all lines)? |
i thought of a way the first line could be indented, just have another line be indented less: {
let return_and_close = ```
return retval;
}
```;
assert_eq!(return_and_close, " return retval;\n}\n");
} |
By my reading of the proposal the string literal would be an error because the second line is less indented than the first. |
well, the proposal can fix that :) |
Python is simply the most prominent example from this list: https://en.wikipedia.org/wiki/Off-side_rule If you look through the examples on that page, you'll see they all use indentation in the same way, so I didn't see much need to support indentation on the first line. I went with the more conservative approach for this RFC because a) I was unable to come up with a use-case for indentation on the first line, b) relaxing the rules later is a backwards compatible change, and c) even with the relaxed rules, there are still strings you cannot represent (eg. a single indented line, or all lines indented) and the set of things that can't be represented is much harder to quantify under the relaxed rules. ie. it seems much easier to say that if you need precise control over indentation, use raw strings. |
Co-authored-by: Caleb Cartwright <[email protected]>
The The There are many different crates implementing some form of DSLs are extremely powerful, and this kind of string literal is well suited to embedding DSLs within Rust programs. I think Rust would benefit immensely from having some kind of "relative-indentation" string literal, regardless of whether it takes the exact form proposed here. |
For example, quite recently PHP added to v7.3 indentation to PHP Heredoc Strings. So, it is useful feature |
I think the actual mechanism of three backticks instead of a double quote is perhaps strange. can't we do a hash prefix on the string literal to mark the string literal as being code-ish and then rustfmt can know to format such string literals? |
With all due respect, I don't think we should take notes on what the PHP people are doing. |
Many people, Including myself are writing Rust in production and are using DSLs. You are fixing a problem that I(and anyone I worked with) didn't know I even had. That should tell you something. Is there anything else to this other than formatting? |
I'm sure I'm not the only one who has written things like this: writeln!(w, " \
<!-- <link rel=\"shortcut icon\" href=\"{rel}favicon.ico\"> -->\
\n</head>\
\n<body>\
\n <div class=\"body\">\
\n <h1 class=\"title\">\
\n {h1}\
\n <span class=\"nav\">{nav}</span>\
\n </h1>") Multiline string literals where you want to be able to indent the body (because having them flush left looks atrocious) but also preserve leading indentation is a mess right now, so I definitely appreciate the motivation behind this RFC. I'm not sure it captures all the things I might want to do though, since for example it might not be the case that the first line is unindented (like in this example, where the first line has a two space indent) or that the minimum indentation is 0 (which is true in this case but might not be if interpolating an inner element instead of the
This seems like quite an unsatisfactory resolution, since the whole point of the syntax is to allow for precise control over indentation which is not otherwise preserved by multiline strings using As the example above should show, if you are outputting indented syntax for a language like python (or in this case, formatted HTML), just because the top level is zero indent doesn't mean that fragments of the string are also zero indent, or that the first line will be zero indent since it might be a fragment of the full output. I have seen the same pattern with formatted Rust code generation, the fragments will generally be inside some kind of scope and hence be nonzero indent, and the repeating block may or may not align with language constructs so the first line might not be the least indented. |
This level of disrespect for other communities is unproductive and doesn't help with the discussion. Whether your feelings are justified or not, it's better to explain why this particular feature doesn't fit for Rust, instead of just showing a general aversion to other language(s). |
Incidentally, the PHP heredoc syntax solves the indentation issue by using the indentation of the closing quote character, rather than the first content line or the minimum indentation. Thus: let x = ```
4 space indented
2 space indented
```;
let x = ```
2 space indented
1 space indented
```;
let x = ```
2 space indented
error, less indented than end delimiter
```; Also note that the newline before the end delimiter does not count as part of the string, you would have to add a |
TLDR: I think the PHP heredoc syntax is the best so far. The PHP heredoc syntax is basically what I was going to suggest (I didn't know PHP used it), though I didn't since it only works when the closing ``` is at the start of it's line (ignoring indentation), which wouldn't work for the RFC's proposed syntax for no trailing let s = ```
line with no ending line terminator```;
This neatly solves the issue by making the syntax for no trailing let s = ```
line with no ending line terminator
```; and the corresponding syntax with a trailing let s = ```
line with a ending line terminator
```; |
text/0000-code-literals.md
Outdated
functionality as required. | ||
|
||
If it is necessary to include triple backticks within a code string | ||
literal, more than three backticks may be used to enclose the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm torn here. Using the same thing as in doccomments makes sense, but at the same time when we already have "use more #
s to escape more" I don't feel amazing about also having a "use more `
s to escape more" construct.
732e5f4
to
47a4b6c
Compare
Thanks for the feedback. I've updated the RFC to propose a variant of the "heredoc"-style indentation rules and updated the "prior art" section. I've also attempted to enumerate every possible syntax variation that has been suggested in the alternatives section. I've kept the triple backtick quote style for now, but I am torn between that and some of the other quote styles. However, I think the new choice for the indentation rules is the best option so far, especially when combined with the modification to optionally suppress the final newline. |
Yes, I think this conversation is drifting very offtopic. I believe the central point is that people can and do use DSLs in Rust. This RFC is proposing one way to improve support for people who do. |
Ack & agreed, no dispute about that. Thinking further we should talk (at some point) about "how" people should (be allowed to) use that cool feature. The mentioned "fear" of bloated files with these "code blocks" in them (very lengthy too) could have a serious impact on maintainability (code-quality) and production use of software written in Rust. |
Regarding length enforcement/linting, are there any existing lints around string length? I can't think of any reason why there should be lints for length of indented strings but not regular non-indented strings. |
What's about nested strings? let s1 = h#"
let s2 = h##"
let s3 = h###"
string
"###;
"##;
"#; |
This is not (yet) possible if the literal is being passed to a proc-macro. Maybe once proc-macro-expand is stabilized such a lint would be useful (though, it'd need proc-macros to be updated to use expansion) but for now if the literal is going into a proc-macro (likely the common case) it should be suppressed. |
1. It adds a four new types of string literals given all | ||
the combinations. | ||
|
||
# Rationale and alternatives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another alternative I have mentioned on zulip:
Improve include!
handling (when passed as literals to macros? in editors?) instead to make it more ergonomic to outline other-language code rather than inlining.
Pros:
-
works better with simple tools that don't handle nested languages well
-
establishes a new indent context, i.e. doesn't need to be adjusted with surrounding code which in my experience can be error-prone if the editor's indentation handling is imperfect. Examples of confusions:
- inside comments
- inside macros
- inside doc comments
- when current indentation is inconsistent with configured rules
- when copy-pasting into a differently indented context
-
generally avoids stacking complexity
some_proc_macro!{ mod m { /// This is an example with nesting and several levels of indentation and whitespaces /// /// ```rust /// let p = h"python /// def py(): /// a = '''Lorem ipsum dolor sit amet, /// consectetur adipiscing elit, /// sed do eiusmod tempor incididunt /// ut labore et dolore magna aliqua.''' /// print(a) /// "; /// ``` /// fn nesting_fun() {} } }
Cons:
- requires editor support if you want to view or even edit the included file in the context of its parent instead of opening a new view. But showing an overlay might be less complex than all the text nesting
- context of substitutions may be harder to see
Since this is motivated by making things easier for rustfmt I recommend contacting the maintainers of other tools (syntax highlighters, editors, IDEs, ...) to see if this change helps or adds complexity for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't consider this an alternative. Requiring powerful editor support to even use the feature makes it a no-go, and having to store things in separate files is a maintenance burden that's worse than the current situation, since it requires coming up with a naming scheme for those files that makes sense, makes it harder to resolve merge conflicts since tools like git
will never understand this "magic include", and is way more complicated than what is proposed in this RFC.
The advantages you list I also consider to be problems with your approach. You say it works better with simple tools, but the opposite is true: you end up with something unworkable without powerful editor features. In contrast this RFC doesn't require any editor features at all to be an improvement over the status quo. Any support for nested language is an optional extra that doesn't affect the core functionality.
Your example of "stacking complexity" seems very straightforward tbh. Infinitely better than having to go to a spearate file.
Since this is motivated by making things easier for rustfmt I recommend contacting the maintainers of other tools (syntax highlighters, editors, IDEs, ...) to see if this change helps or adds complexity for them.
It by definition does not add any complexity for tools other than rustfmt, since the only required change as a result of this RFC is allowing a new prefix letter (h
proposed here) and tools must already support that. Beside that, anything that is valid to do with a raw string literal is also valid to do with an h
raw string literal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It by definition does not add any complexity for tools other than rustfmt, since the only required change as a result of this RFC is allowing a new prefix letter (
h
proposed here) and tools must already support that. Beside that, anything that is valid to do with a raw string literal is also valid to do with anh
raw string literal.
Anything that adds syntax complicates syn
and any other tools that use it or otherwise parse rust code. I can't imagine that it would ever be safe to just assume that any string prefix acts like a regular string literal, since raw strings already violate that, hence individual new letters have to be added to anything that parses rust code, including syntax highlighters (although the backup behavior is usually good enough for these).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You end up with something unworkable without powerful editor features.
How? Even many simple editors at least have tabs, panes or similar UI elements to view more than one file at a time.
At its most primitive you rely on your window manager and file browser to open multiple files at the same time in separate windows and show them side by side.
Any support for nested language is an optional extra that doesn't affect the core functionality.
A simple editor can have primitive syntax-highlighting that will work with separate files based on file extensions but won't work with inlined content. So this RFC makes things worse for simple editors
makes it harder to resolve merge conflicts since tools like git will never understand this "magic include"
I don't see how it would make things more difficult for git? If anything it makes diffs simple due to fewer whitespace adjustments.
Requiring powerful editor support to even use the feature makes it a no-go,
Where did I said that a powerful editor would be required? Rather I'm suggesting
a) improve powerful editors
b) keep things simple for simple editors
This covers both.
Your example of "stacking complexity" seems very straightforward tbh. Infinitely better than having to go to a separate file.
What is straight-forward about it? If you actually have to edit, indent, copy-paste, syntax-highlight or auto-complete that there are lots of pitfalls.
Note the outer macro which tends to make things more difficult for tools because at that point point they might not even know anymore whether they're dealing with rust or just things that happen to tokenize like rust.
And it's rust -> macro -> markdown -> codeblock (with language annotation) -> multiline string (with another language annotation).
These languages could be configured to have different indent rules!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think improving support for include!
like things to be a negative (proc-macro-include RFC, proc-macro-expand feature would both be great to have), but it's a feature for different usecases than this. This RFC improves support for things that people are already doing. Even if we had better forms of include!
I would not pull out 3 lines of SQL to a separate file just to get syntax highlighting, I would simply do what we do currently: use the existing literal strings and fight with rustfmt every time the surrounding code changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separation of languages is the norm and should be encouraged. See the HTML/CSS/JS split that is encouraged instead of having inline script handlers and styles. See template files. See module trees.
You say my approach is a no-go because it makes things more difficult for simple editors. And yet you acknowledge that this RFC will primarily benefit complex editors. While I think my approach would benefit simple editors because they can then work with the outlined language.
At the moment, these strings are in the file and so can be reviewed and have conflicts resolved in-place. By moving them to a separate file you can no longer perform these actions with any context about the surrounding code.
I assume they'd conventionally still be placed in the same directory and show up in the diffs next to each other.
To make that at all workable you'd need a powerful editor to allow treating them as though they weren't in a separate file
Not necessarily. E.g. when you have an SQL query query!(include!("query.psql"), param1="val", param2="val", ...)
then it has an API, like a function call. You edit functions separately and then fix their callsites.
So "jump to definition" + error messages from the query!
macro about missing arguments would already cover that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yet you acknowledge that this RFC will primarily benefit complex editors.
My expectation is that this RFC will not effect complex editors (in cases where they are not acting as simple editors).
A complex editor that is using heuristics to determine when to apply other-language syntax highlighting to a literal could similarly use those heuristics to determine when to apply other-language auto-formatting to a literal.
This RFC simply provides support for auto-indentation (but not formatting) of literals for simple editors (and complex editors where their heuristics don't apply) that use rustfmt.
EDIT: actually, I forgot that this RFC also included language hints, which would allow a very strong hint to the complex editor heuristics of what other-language to treat a literal as, but it also likely allows editors in between simple and complex to use very simple heuristics and start multi-language highlighting where they couldn't previously.
EDIT2: To clarify some of my categorical assumptions to make sure there's no misunderstanding:
- simple editor: notepad -> notepad++ -> unconfigured vim
- no code understanding or only simple regex based highlighting
- complex editor: neovim/vscode + LSP, jetbrains
- semantic code understanding, so it actually knows which macro literals are being passed to
- in between: minimally configured vim/neovim without an LSP
- still just syntactic code understanding, but better than the simple regexes, so it doesn't know which macro is which to use for multi-language heuristics, but it can parse and use language hints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separation of languages is the norm and should be encouraged.
That is not my experience. I've almost never seen sql queries pulled out into separate files. Most assembly I've seen is inline. Shader languages are a bit of a mix, and I don't have as much familiarity with it, but I don't think it is at all unusual to include shader code inline, especially if it is small. And this feature would be very useful for help text for cli programs. I can't imagine using a separate file for the help comment for every option in my cli that uses clap.
See the HTML/CSS/JS split that is encouraged instead of having inline script handlers and styles
But we also have frameworks like react, where html and css are embedded in Javascript. Or svelte where the JS is included in an html template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shader languages are the one case I can think of where people actually care about "separation of languages", and then it has to do more with the fact that GPU code inherently has a modularity to it, because it is run in passes, and people tend to pull out modules into, well, modules. So you may as well have, e.g.
- code.cpp
- code.hpp
- code.vert
- code.frag
But ofc you may well just encounter something like
- code.cs
- code.hlsl
Depending.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
notepad++
Has syntax highlighting.
But we also have frameworks like react, where html and css are embedded in Javascript. Or svelte where the JS is included in an html template.
Yes, and I have encountered issues with that kind of multi-language, framework-specific file formats that makes me prefer separate files. Simple editors just didn't support it at all or mistook it as only one of the languages, complex editors had configuration issues because they picked up the wrong preprocessor version or something which led to lots of bogus squiggles in those files while vanilla JS files had no issues.
Most assembly I've seen is inline.
https://github.com/xiph/rav1e/tree/master/src/arm
https://github.com/memorysafety/rav1d/tree/main/src/x86
https://github.com/rust-lang/stacker/tree/master/psm/src/arch
Though none of that needs to be include!
ed / act as a template in the first place, it's static code with a fixed interface and compiled separately. I can't think of a project that needs templated ASM.
The reference-level explanation should say what happens in Rust 2018 and earlier (where supporting these literals would be an incompatible change; see reserved-prefixes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not believe this proposal sufficiently engages with why programming languages other than Python and Markdown make the choices they do. In particular, the Swift programming language chooses to instead reject anything on the first line (before the multiline literal "proper"), and I think for good reasons. It is very easy to go from emitting a string literal something like this:
let text = "text\ntext\ntext\ntext";
To, wanting nicer formatting for generated code, emit this:
let text = "text
text
text
text
";
This causes accidentally losing the first line. Even with a clarification of this RFC to add restrictions to what is allowed to go there so fewer inputs can be silently dropped, I don't think it is very "in character" for Rust to allow code that may be incorrect to pass compiling when it would be very easy to use a slightly different rule and catch a common mistake.
Anything directly after the opening quote is not considered | ||
part of the string literal. It may be used as a language hint or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything directly after the opening quote is not considered | |
part of the string literal. It may be used as a language hint or | |
Anything directly after the opening quote is not considered | |
part of the string literal. It may be used as a language hint or |
There is no specified separator aside from the implied separator of the newline. Some people have mistaken this proposal as only allowing a constrained option here. It does not. It says "Anything", and specifies no compiler error if the symbols that immediately follow the "
are, say... "
. Perhaps you meant to constrain it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is what you're getting at, but the first line is still constrained by the delimeters of the string. ie. if the string begins h"
then a single quote will still close the string even if it's on the first line. If a single quote would not close the string then it would still be allowed on the first line. The indentation and language hint rules apply "after" we've determined the bounds of the string literal.
part of the string literal. It may be used as a language hint or | ||
processed by macros (similar to the treatment of doc comments). | ||
|
||
```rust | ||
let sql = hr#"sql | ||
SELECT * FROM table; | ||
"#; | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not believe placing metadata regarding the string inside the visible string delimiter tokens should be accepted, as it has many negative impacts. In particular, there is an isomorphism between strings written using r#""#
and strings written using ""
(and without STRING_CONTINUE
, i.e. 0x5C 0x0A), currently, that as far as I know is complete. This proposal would create a surjective function: there would be string literals written using
h"languagetag
"
which have no mirror image using the other syntactic forms for string literals. This causes great amounts of confusion for:
- Lexing
- Parsing
- Code generation
And the very purpose of this language hint is for the service of syntax highlighters and the like, which are very likely going to be written in a language that may have no easy access to simply running syn or tree-sitter or whatever, and may instead be bashed together out of JavaScript and regexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular, there is an isomorphism between strings written using r#""# and strings written using "" (and without STRING_CONTINUE, i.e. 0x5C 0x0A), currently, that as far as I know is complete.
Not sure what you're getting at here. There are already many ways to get the same "literal value" using different "encodings" of the same literal. For example, tabs could be encoded with \t
or an actual tab character.
This proposal would create a surjective function: there would be string literals written using
h"languagetag
"which have no mirror image using the other syntactic forms for string literals.
This causes great amounts of confusion for:
Lexing Parsing Code generation
This is going to need some more justification.
First of all, the language hint is purely a syntactic feature, it doesn't change the "value" of a string literal, so in terms of "values" (which if we're using set theoretic terms, is the most plausible thing to talk about but you haven't actually defined that...) there is the same amount of isomorphism between code string literals and raw string literals as there was between raw string literals and string literals (modulo indentation being relative, which is the entire point of the proposal).
Secondly, I flat out don't believe that this does introduce significant complexity in those areas. The compiler/tooling is already capable of dealing with string literals and raw string literals. This RFC doesn't change the basic rules for when a string begins/ends - the parsing rules are identical to the corresponding non-code-literal form. The only change is to how the content within the literal is converted into a value for use by the program.
- Byte string literals `hb"` | ||
- Raw byte string literals `hbr#"` | ||
|
||
The `h` modifier will appear before all characters in the prefix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no persuasive and particular reason offered to have this precede all other characters in the prefix. It would be preferable to assume that we are going to explore accepting a non-canonical ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Experimentally, br"<content>"
compiles, but rb"<content>"
does not compile. This implies that we are already particular about the order of string prefixes, and so I wrote this RFC with consistency in mind. I don't particularly care about what order is "canonical" but this rule was easy to define and seemed reasonably intuitive. If you have a strong reason to prefer a different order I'd love to heear it.
An `h` modifier may be added to the prefix of the following string | ||
literal types: | ||
|
||
- String literals `h"` | ||
- Raw string literals `hr#"` | ||
- Byte string literals `hb"` | ||
- Raw byte string literals `hbr#"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this not include c"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular reason - I was using stable Rust as a baseline, but I can update the RFC to include C string literals. The intent is that they combine in the natural way. That said, it looks like the implementation of the the C string literal RFC was reverted due to breakage, so... We'll see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main drawback is increased complexity of the language: | ||
|
||
1. It adds a four new types of string literals given all | ||
the combinations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Teaching.
String literals are used in pattern matching. It will be very annoying to explain why a metadata tag that can be written as part of the literal and lives inside what appears to be the string's delimiter tokens does or does not participate in pattern matching. I would prefer the question simply not arise.
Specifically, this works:
let "SELECT" = &maybe_select_expr[0..6] else {
return;
};
And I presume, with this proposal, that this would work:
let h"
SELECT
" = &maybe_select_expr[0..6] else {
return;
};
But I do not want to explain why either of these may or may not work:
let h"x86asm
" = &maybe_sql_expr[0..0] else {
return;
};
let h"sql
" = &maybe_sql_expr[0..0] else {
return;
};
All answers seem bad, to me. Introducing a form that allows the question to arise in the first place can simply be avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we could add more complex rule for language_tag
, for example first line with a tag must end with #
.
let h#"sql#
SELECT
"# = &maybe_sql_expr[0..2] else { return; };
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have not actually clarified anything as long as the tag is inside the quotation marks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the h#<lang>
syntax was more natural when this RFC was still proposing the markdown-like triple backtick syntax (```<lang>
).
Once feature(stmt_expr_attributes)
is stabilized, I think that would nicely enabled something like (even if that is somewhat more verbose):
let sql = #[editor::inject_lang(sql)] h#"
SELECT * FROM table;
"#;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I do not want to explain why either of these may or may not work:
As proposed in the RFC, both of those would match as the language hint is not part of the value. I don't think this case is really any different from eg.
let "\t" = &maybe_tab_expr[0..0] else {
return;
};
let " " = &maybe_tab_expr[0..0] else {
return;
};
Or:
enum Foo {
Bar,
Baz,
}
use Foo::Bar as Bat;
fn main() {
match Foo::Bar {
self::Bat => println!("Bat"),
Foo::Baz => println!("Baz")
}
}
Ultimately, you can't expect pattern matching to be syntactic - it's fundamentally about the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes yes, and 2_5_5
also is matched by 255
and 0xFF
, as those names alias, and if you introduce a specific alias for something, shockingly, it matches. And introducing redundant aliases without consideration for the potential harms to understanding is what I am objecting.
However, your comments have made apparent to me that you fundamentally do not actually believe this increases language complexity, as you don't think it makes it harder to parse or understand the source code, so I also object to the text written here. If mere quantitative increase in ways to express something does not count as an increase in complexity, then this entry is a lie and there is no drawback.
The main drawback is increased complexity of the language: | |
1. It adds a four new types of string literals given all | |
the combinations. | |
None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, your comments have made apparent to me that you fundamentally do not actually believe this increases language complexity, as you don't think it makes it harder to parse or understand the source code
I do think it increases language complexity, but only in the sense that the language now has N+1 features rather than N. Where I disagree with you is the idea that there is a qualitative rather than quantitative difference in complexity in comparison to existing string literals.
My hope is that even this incremental increase in complexity could be later reduced: given that the feature is designed to allow represent every possible string, I think there's a world where a future edition simply makes all multiline literals behave like the literals proposed here. I think it would be appropriate to propose this more drastic change if we later find that the use of code string literals naturally replaces the use of multiline string / raw string literals due to people preferring an "indentation relative form", and if no unforeseen drawbacks are encountered.
I wonder if maybe the tag part should be deferred to a later PR (but kept in the future possibilities section). And for now just error if there is any text on the first line. Although, then there is a risk that macros or external tools rely on that behavior and break if and when tags are added later. I also think that the RFC should better specify what it means to measure the whitespace. IMO, the cleanest way would be to require that the indentation must exactly match on each line. So for example you can't have tabs on one line, and spaces on another, or even the same number of spaces and tabs, but in different order. Or go even further and forbid mixed spaces and tabs altogether. It also feels a little weird to me that the empty string takes multiple line with this: let empty = h"
-"; and I'm not overly fond of the "-" to suppress the final newline. I can't think of anything obviously better though. I will suggest another alternative. The final newline could be suppressed with a backslash on the penultimate line, like so: let s = h"
something \
"; That doesn't require adding any additional syntax, since it works the same as regular strings. |
It's a fair criticism. There's certainly a risk there, but it's difficult to say how significant that risk actually is. Syntax highlighting the "language hint" differently would significantly mitigate that risk, and is trivial to do even if an IDE has no support for syntax highlighting the nested code itself. My opinion is that you are overstating the risk here: in the example you provided the first line clearly stands out from the rest given the differing indentation, even without syntax highlighting. It's not clear why making a mistake here would be more significant, or harder to catch, than making a mistake anywhere else in the code. If there was an alternative way to specify the language hint which wasn't worse and avoided the risk entirely, then I would be open to that, but I think the far bigger danger here is ending up with a syntax that is too heavy to use effectively. The current syntax: let _ = hr#"foo
<content>
"#; Is about at the limit of what I think is reasonable for such a QoL improvement to cost, so using eg. inline attribute syntax such as: let _ = #[lang(sql)] hr#"
<content>
"#; Would be too intrusive, especially the excessive use of #s. The reason to support the language feature at all is to enable better tooling support. From basic syntax highlighting to more advanced features. It opens up many opportunities that didn't exist before, and is useful information for the programmer to be able to express in the code and for others reading it. |
And my opinion is that you are understating it.
My concern includes on-the-fly generated Rust code which is, in a strict, computational sense, impossible for me to eye-check and check-in for every example which I might want to generate, but which I may wish to have nicely formatted when I emit it, nonetheless, for various reasons. For example, it may later be inspected for debugging purposes. I would rather the compiler immediately err in those cases of emitting a malformed string, and that I can begin handling the compiler error that has been propagated into my tools via
and kept them to affecting whitespace which is comparatively easy to reason about, in a quasi-inverse of the rule regarding |
Truly, genuinely, I am content with a change as small as this:
Or if you prefer: let _ = hr#foo"
<content>
"#; I believe |
But why generate "code string literals" at all in that case? If the code is not intended to be edited by humans, then could you not generate a raw string literal? Let's say for the sake of argument that you both want to generate nicely indented output, and you don't want the extraneous whitespace that would come with a raw string literal, and the generated code is not intended to be checked into source control / generally viewed by a human being. In that case, even with a code string literal you'd need to make sure that every line was properly indented right? In that case, a simple validation rule that would catch mistakes in your code relating to the first line would be to disallow whitespace between the opening quote and the start of the language hint if present.
I will add the former as an alternative in the RFC. The latter doesn't really work as the let _ = h"
<content>
"; This is not a strongly held opinion, but I think it's suboptimal to use |
That is not quite my concern. My concern is specifically that I do want it to be potentially viewable by a human being, and that it is somewhere, logged for later review if necessary, but it's not like I am reviewing every single instance on git or whatever. This later review may happen whether the compilation succeeds or fails. Indeed, my concern is I would like to be able to make my codegen nice and legible for the benefit of places that I may never see it, without a concern that the result may be miscompiled. And some of the strings may, indeed, be SQL which my generated Rust code will later tell a database to execute, and I want to make examining the source easy and keep it easy to reason about why things are wrong even for people who may not write Rust programs very often, as they can still examine and easily read nicely formatted SQL that is also nicely formatted in the context of the Rust program. And judging by the occasional error reports I get from these faraway databases, I am pretty sure they're not that familiar with Markdown and its quirks, either. Part of what makes what I have made possible is that rustc is so very enthusiastic already about valid parses, so that I can simply defer a lot of work into the compiler instead of precompiling the code myself, because then it becomes a simple transaction with the compiler. |
This is already specified in the RFC:
That would be an error, since there is no final newline to suppress in that example. The empty string would be simply:
With zero lines between the opening and closing quote, there is no newline to suppress. Contrast this to:
In this case there is a single line, and so there does exist a final newline that can be suppressed. |
Just a suggestion, but one could also look at nix's syntax for an inspiration, {
environment.etc."auto-cpufreq.conf".text = ''
[charger]
governor = powersave
turbo = never
[battery]
governor = powersave
turbo = never
'';
} Normal strings are as is, with double quotes The indentation is cleared by the compiler at compile-time, and if the ending quote |
Why not combining the idea of the Back-Tick syntax and mix it with the brackets? Example:
everything that replaces |
well, that's currently valid code, so changing it to be a string would conflict: pub fn foo() {bar
() // weird formatting for calling bar()
}
fn bar() {} |
Since |
I almost forgot that unclean syntax is a thing in Rust. I could provide another alternative: let mymultilinestr = S{<lang>
<string>
}; The S infront of the bracket indicates it is a multi-line string. After that a language tag can be added, and in a new line starts the multi-line string until the last bracket. Because of the prefixing S it does |
struct S { lang: u32, b: u32 }
let lang = 1;
let b = 2;
let mymultilinestr = S{lang
, b
}; |
Right this would conflict. Prefixing it with an
The other alternative I would suggest are back ticks or a backslash to indicate that not a struct is meant. I'm just playing around with ideas, how to make it usable. |
|
good point. What about the wave let mymultilinestr = ~{<lang>,
<mlstring>
}; |
A comment can also be used for specifying a language: let cool_codes = /*rust*/r#"
fn main(){unsafe{*(0 as*mut _)=0}}
"#; This is similar to how Helix editor highlights strings for Nix language. There seems to be two separate features proposed here:
Also for prior art: https://github.com/tc39/proposal-string-dedent EDIT: ((block_comment) @injection.language .
(raw_string_literal (string_content) @injection.content)
) You need to have quite new rust grammar for this at the time of writing. |
Add a new syntax for multi-line string literals designed to contain code and play nicely with
rustfmt
.Rendered