Skip to content

Treat backslash as normal char in TextElements #181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 30, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions spec/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,16 @@
consistent with how `Identifiers` of variables don't include the `$`
sigil either.

- Treat backslash (`\`) as a regular character in `TextElements`. (#123)

Backslash does no longer have special escaping powers when used in
`TextElements`. It's still recognized as special in `StringLiterals`,
however. `StringLiterals` can be used to insert all special-purpose
characters in text. For instance, `{"{"}` will insert the literal opening
curly brace (`{`), `{"\u00A0"}` will insert the non-breaking space, and
`{" "}` can be used to make a translation start or end with whitespace,
which would otherwise by trimmed by `Pattern.`

## 0.7.0 (October 15, 2018)

- Relax the indentation requirement. (#87)
Expand Down
56 changes: 40 additions & 16 deletions spec/fluent.ebnf
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ InlineExpression ::= StringLiteral
| inline_placeable

/* Literals */
StringLiteral ::= quote quoted_text_char* quote
StringLiteral ::= "\"" quoted_char* "\""
NumberLiteral ::= "-"? digit+ ("." digit+)?

/* Inline Expressions */
Expand All @@ -83,23 +83,47 @@ VariantKey ::= "[" blank? (NumberLiteral | Identifier) blank? "]"
Identifier ::= [a-zA-Z] [a-zA-Z0-9_-]*
Function ::= [A-Z] [A-Z_?-]*

/* Characters */
backslash ::= "\\"
quote ::= "\""
/* Any Unicode character from BMP excluding C0 control characters, space,
* surrogate blocks and non-characters (U+FFFE, U+FFFF).
* Cf. https://www.w3.org/TR/REC-xml/#NT-Char
/* Content Characters
*
* Translation content can be written using most Unicode characters, with the
* exception of C0 control characters (but allowing tab), surrogate blocks and
* non-characters (U+FFFE, U+FFFF).
*/
regular_char ::= [\\u{21}-\\u{D7FF}\\u{E000}-\\u{FFFD}\\u{10000}-\\u{10FFFF}]
text_char ::= blank_inline
| "\u0009"
| /\\u[0-9a-fA-F]{4}/
| (backslash backslash)
| (backslash "{")
| (regular_char - "{" - backslash)
any_char ::= [\\u{9}\\u{20}-\\u{D7FF}\\u{E000}-\\u{FFFD}]
| [\\u{10000}-\\u{10FFFF}]

/* Text elements
*
* The primary storage for content are text elements. Text elements are not
* delimited with quotes and may span multiple lines as long as all lines are
* indented. The opening brace ({) marks a start of a placeable in the pattern
* and may not be used in text elements verbatim. Due to the indentation
* requirement some text characters may not appear as the first character on a
* new line.
*/
special_text_char ::= "{"
text_char ::= any_char - special_text_char
indented_char ::= text_char - "}" - "[" - "*" - "."
quoted_text_char ::= (text_char - quote)
| (backslash quote)

/* String literals
*
* For special-purpose content, quoted string literals can be used where text
* elements are not a good fit. String literals are delimited with double
* quotes and may not contain line breaks. String literals use the backslash
* (\) as the escape character. The literal double quote can be inserted via
* the \" escape sequence. The literal backslash can be inserted with \\. The
* literal opening brace ({) is allowed in string literals because they may not
* comprise placeables.
*/
special_quoted_char ::= "\""
| "\\"
special_escape ::= "\\" special_quoted_char
unicode_escape ::= "\\u" /[0-9a-fA-F]{4}/
quoted_char ::= (any_char - special_quoted_char)
| special_escape
| unicode_escape

/* Numbers */
digit ::= [0-9]

/* Whitespace */
Expand Down
104 changes: 69 additions & 35 deletions syntax/grammar.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -199,9 +199,9 @@ let InlineExpression = defer(() =>
/* Literals */
let StringLiteral = defer(() =>
sequence(
quote,
repeat(quoted_text_char),
quote)
string("\""),
repeat(quoted_char),
string("\""))
.map(element_at(1))
.map(join)
.chain(into(FTL.StringLiteral)));
Expand Down Expand Up @@ -370,51 +370,85 @@ let Function =
.map(join)
.chain(into(FTL.Function));

/* ---------- */
/* Characters */
/* -------------------------------------------------------------------------- */
/* Content Characters
*
* Translation content can be written using most Unicode characters, with the
* exception of C0 control characters (but allowing tab), surrogate blocks and
* non-characters (U+FFFE, U+FFFF).
*/

let backslash = string("\\");
let quote = string("\"");
let any_char =
either(
charset("\\u{9}\\u{20}-\\u{D7FF}\\u{E000}-\\u{FFFD}"),
charset("\\u{10000}-\\u{10FFFF}"));

/* Any Unicode character from BMP excluding C0 control characters, space,
* surrogate blocks and non-characters (U+FFFE, U+FFFF).
* Cf. https://www.w3.org/TR/REC-xml/#NT-Char
/* -------------------------------------------------------------------------- */
/* Text elements
*
* The primary storage for content are text elements. Text elements are not
* delimited with quotes and may span multiple lines as long as all lines are
* indented. The opening brace ({) marks a start of a placeable in the pattern
* and may not be used in text elements verbatim. Due to the indentation
* requirement some text characters may not appear as the first character on a
* new line.
*/
let regular_char =
charset("\\u{21}-\\u{D7FF}\\u{E000}-\\u{FFFD}\\u{10000}-\\u{10FFFF}");

let text_char = defer(() =>
either(
blank_inline,
string("\u0009"),
regex(/\\u[0-9a-fA-F]{4}/),
sequence(
backslash,
backslash).map(join),
sequence(
backslash,
string("{")).map(join),
and(
not(backslash),
not(string("{")),
regular_char)));
let special_text_char =
string("{");

let indented_char = defer(() =>
let text_char =
and(
not(special_text_char),
any_char);

let indented_char =
and(
not(string(".")),
not(string("*")),
not(string("[")),
not(string("}")),
text_char));
text_char);

let quoted_text_char =
/* -------------------------------------------------------------------------- */
/* String literals
*
* For special-purpose content, quoted string literals can be used where text
* elements are not a good fit. String literals are delimited with double
* quotes and may not contain line breaks. String literals use the backslash
* (\) as the escape character. The literal double quote can be inserted via
* the \" escape sequence. The literal backslash can be inserted with \\. The
* literal opening brace ({) is allowed in string literals because they may not
* comprise placeables.
*/

let special_quoted_char =
either(
string("\""),
string("\\"));

let special_escape =
sequence(
string("\\"),
special_quoted_char)
.map(join);

let unicode_escape =
sequence(
string("\\u"),
regex(/[0-9a-fA-F]{4}/))
.map(join);

let quoted_char =
either(
and(
not(quote),
text_char),
sequence(
backslash,
quote).map(join));
not(special_quoted_char),
any_char),
special_escape,
unicode_escape);

/* ------- */
/* Numbers */

let digit = charset("0-9");

Expand Down
23 changes: 18 additions & 5 deletions test/fixtures/escaped_characters.ftl
Original file line number Diff line number Diff line change
@@ -1,9 +1,22 @@
backslash = Value with \\ (an escaped backslash)
closing-brace = Value with \{ (a closing brace)
unicode-escape = \u0041
escaped-unicode = \\u0041
## Literal text
text-backslash-one = Value with \ a backslash
text-backslash-two = Value with \\ two backslashes
text-backslash-brace = Value with \{placeable}
text-backslash-u = \u0041
text-backslash-backslash-u = \\u0041

## String Expressions
## String literals
quote-in-string = {"\""}
backslash-in-string = {"\\"}
# ERROR Mismatched quote
mismatched-quote = {"\\""}
# ERROR Unknown escape
unknown-escape = {"\x"}

## Unicode escapes
string-unicode-sequence = {"\u0041"}
string-escaped-unicode = {"\\u0041"}

## Literal braces
brace-open = An opening {"{"} brace.
brace-close = A closing } brace.
Loading