Skip to content

Commit 0325bb2

Browse files
authored
Merge pull request #42 from projectfluent/processor
Pattern Processor
2 parents 4b9e889 + c89c2cf commit 0325bb2

File tree

5 files changed

+472
-0
lines changed

5 files changed

+472
-0
lines changed

fluent.syntax/docs/processing.rst

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
Processing
2+
==========
3+
4+
.. code-block:: kotlin
5+
6+
package org.projectfluent.syntax.processor
7+
8+
import org.projectfluent.syntax.ast.*
9+
10+
/**
11+
* Process patterns by returning new patterns with elements transformed.
12+
*/
13+
class Processor {
14+
/**
15+
* "Bake" the values of StringLiterals into TextElements. This is a lossy
16+
* transformation for literals which are not special in Fluent syntax.
17+
*/
18+
fun unescapeLiteralsToText(pattern: Pattern): Pattern
19+
20+
/**
21+
* "Un-bake" special characters into StringLiterals, which would otherwise
22+
* cause syntax errors with Fluent parsers.
23+
*/
24+
fun escapeTextToLiterals(pattern: Pattern): Pattern
25+
}

fluent.syntax/docs/reference.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,4 @@ provide more fine-grained control and detail.
1616
ast
1717
visitor
1818
serializing
19+
processing

fluent.syntax/docs/usage.rst

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ Using syntax
44
The ``org.projectfluent:syntax`` package provides a parser, a serializer, and libraries
55
for analysis and processing of Fluent files.
66

7+
78
Parsing
89
-------
910

@@ -29,6 +30,7 @@ To create Fluent syntax from AST objects, use
2930
serializer.serialize(resource)
3031
serializer.serialize(resource.body[0])
3132
33+
3234
Analysis (Visitor)
3335
------------------
3436

@@ -51,6 +53,7 @@ to continue iteration.
5153
}
5254
}
5355
56+
5457
Custom Traversal (childrenOf)
5558
-----------------------------
5659

@@ -69,3 +72,89 @@ iterate over children.
6972
listOf("default", "key", "span", "value"),
7073
variant_props.map { (name, _) -> name }.sorted().toList()
7174
)
75+
76+
77+
Pattern Processing
78+
------------------
79+
80+
The :py:class:`syntax.processor.Processor` class can be used to transform
81+
patterns in a way that is friendly to localization workflows which want to
82+
allow text characters which are special in Fluent to be written as regular
83+
text.
84+
85+
According to the Fluent syntax, characters like the curly braces must be
86+
enclosed in :py:class:`syntax.ast.StringLiteral` instances if they are
87+
supposed to be part of the translation content. Otherwise, an open curly
88+
brace would start a :py:class:`syntax.ast.Placeable` and likely lead to a
89+
syntax error.
90+
91+
Workflows in which the support for Fluent placeables is limited may choose to
92+
provide their own visual cues for them. This often comes in form of visual
93+
placeholders which can be rearranged within a translation segment by the
94+
translator, but whose contents cannot be modified.
95+
96+
In these workflows, the special meaning of the character like the curly
97+
braces is void; the translator is not able to insert new placeables by
98+
opening a curly brace anyways. Thus, for translators' convenience, the curly
99+
brace can be treated as a regular text character and part of the translation
100+
content.
101+
102+
The :py:class:`syntax.processor.Processor`'s methods allow baking
103+
:py:class:`syntax.ast.StringLiteral` instances into surrounding
104+
:py:class:`syntax.ast.TextElement` instances, and then "un-baking" them again
105+
if required by the Fluent syntax. Note that all string literals are baked,
106+
while only some are un-baked. The processing is a lossy transformation.
107+
108+
.. note::
109+
Processed patterns are not valid Fluent AST nodes anymore and must not be
110+
serialized without first un-processing them.
111+
112+
113+
Baking literals into text
114+
^^^^^^^^^^^^^^^^^^^^^^^^^
115+
116+
Use the :py:func:`unescapeLiteralsToText` method to bake the values of string
117+
literals into the surrounding text elements. This is a lossy transformation
118+
for literals which are not special in Fluent syntax.
119+
120+
Examples::
121+
122+
→Hello, {"{-_-}"}.
123+
←Hello, {-_-}.
124+
125+
→{" "}Hello, world!
126+
← Hello, world!
127+
128+
→A multiline pattern:
129+
{"*"} Asterisk is special
130+
←A multiline pattern:
131+
* Asterisk is special
132+
133+
→Copyright {"\u00A9"} 2020
134+
←Copyright © 2020
135+
136+
137+
Un-baking special characters into literals
138+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
139+
140+
Use the :py:func:`escapeTextToLiterals` method to un-bake special characters
141+
into string literals, which would otherwise cause syntax errors with Fluent
142+
parsers. Character sequences which might have been previously enclosed in
143+
string literals will not be un-baked as long as they are valid text
144+
characters in Fluent syntax.
145+
146+
Examples::
147+
148+
→Hello, {-_-}.
149+
←Hello, {"{"}-_-{"}"}.
150+
151+
→ Hello, world!
152+
←{""} Hello, world!
153+
154+
→A multiline pattern:
155+
* Asterisk is special
156+
←A multiline pattern:
157+
{"*"} Asterisk is special
158+
159+
→Copyright © 2020
160+
←Copyright © 2020
Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
package org.projectfluent.syntax.processor
2+
3+
import org.projectfluent.syntax.ast.* // ktlint-disable no-wildcard-imports
4+
import java.lang.Exception
5+
6+
/**
7+
* Process patterns by returning new patterns with elements transformed.
8+
*/
9+
class Processor {
10+
/**
11+
* "Bake" the values of StringLiterals into TextElements. This is a lossy
12+
* transformation for literals which are not special in Fluent syntax.
13+
*/
14+
fun unescapeLiteralsToText(pattern: Pattern): Pattern {
15+
val result = Pattern()
16+
for (elem in textFromLiterals(pattern)) {
17+
result.elements.add(elem)
18+
}
19+
return result
20+
}
21+
22+
/**
23+
* "Un-bake" special characters into StringLiterals, which would otherwise
24+
* cause syntax errors with Fluent parsers.
25+
*/
26+
fun escapeTextToLiterals(pattern: Pattern): Pattern {
27+
val result = Pattern()
28+
for (elem in literalsFromText(pattern)) {
29+
result.elements.add(elem)
30+
}
31+
return result
32+
}
33+
34+
private fun textFromLiterals(pattern: Pattern) = sequence {
35+
var lastText: TextElement? = null
36+
pattern.elements.forEach { element ->
37+
when (element) {
38+
is TextElement -> {
39+
if (lastText == null) {
40+
lastText = element
41+
} else {
42+
lastText?.let { it.value += element.value }
43+
}
44+
}
45+
is Placeable -> {
46+
when (val expression = element.expression) {
47+
is StringLiteral -> {
48+
var content = expression.value
49+
content = special.replace(content) { m -> unescape(m) }
50+
if (lastText == null) {
51+
lastText = TextElement("")
52+
}
53+
lastText?.let { it.value += content }
54+
}
55+
is SelectExpression -> {
56+
val processedVariants: MutableList<Variant> = mutableListOf()
57+
for (variant in expression.variants) {
58+
val processedVariant = Variant(variant.key, unescapeLiteralsToText(variant.value), variant.default)
59+
processedVariants.add(processedVariant)
60+
}
61+
val processedSelect = SelectExpression(expression.selector, processedVariants)
62+
val placeable = Placeable(processedSelect)
63+
64+
lastText?.let {
65+
yield(it)
66+
lastText = null
67+
}
68+
yield(placeable)
69+
}
70+
else -> {
71+
lastText?.let {
72+
yield(it)
73+
lastText = null
74+
}
75+
yield(element)
76+
}
77+
}
78+
}
79+
}
80+
}
81+
lastText?.let { yield(it) }
82+
}
83+
84+
private fun literalsFromText(pattern: Pattern) = sequence {
85+
pattern.elements.forEach { element ->
86+
when (element) {
87+
is TextElement -> {
88+
if (element.value.startsWith(' ') || element.value.startsWith('\n')) {
89+
val expr = StringLiteral("")
90+
yield(Placeable(expr))
91+
}
92+
93+
var startIndex = 0
94+
for (i in element.value.indices) {
95+
when (val char = element.value[i]) {
96+
'{', '}' -> {
97+
val before = element.value.substring(startIndex, i)
98+
if (before.isNotEmpty()) {
99+
yield(TextElement(before))
100+
}
101+
val expr = StringLiteral(char.toString())
102+
yield(Placeable(expr))
103+
startIndex = i + 1
104+
}
105+
'[', '*', '.' -> {
106+
if (i > 0 && element.value[i - 1] == '\n') {
107+
val before = element.value.substring(startIndex, i)
108+
yield(TextElement(before))
109+
val expr = StringLiteral(char.toString())
110+
yield(Placeable(expr))
111+
startIndex = i + 1
112+
}
113+
}
114+
}
115+
}
116+
117+
// Yield the remaining text.
118+
if (element.value.lastIndex > startIndex) {
119+
val text = element.value.substring(startIndex)
120+
yield(TextElement(text))
121+
}
122+
123+
if (element.value.endsWith(' ') || element.value.endsWith('\n')) {
124+
val expr = StringLiteral("")
125+
yield(Placeable(expr))
126+
}
127+
}
128+
is Placeable -> {
129+
when (val expression = element.expression) {
130+
is SelectExpression -> {
131+
val rawVariants: MutableList<Variant> = mutableListOf()
132+
for (variant in expression.variants) {
133+
val rawVariant = Variant(variant.key, escapeTextToLiterals(variant.value), variant.default)
134+
rawVariants.add(rawVariant)
135+
}
136+
val rawSelect = SelectExpression(expression.selector, rawVariants)
137+
val placeable = Placeable(rawSelect)
138+
yield(placeable)
139+
}
140+
else -> {
141+
yield(element)
142+
}
143+
}
144+
}
145+
}
146+
}
147+
}
148+
149+
private val special =
150+
"""\\(([\\"])|(u[0-9a-fA-F]{4}))""".toRegex()
151+
152+
private fun unescape(matchResult: MatchResult): CharSequence {
153+
val matches = matchResult.groupValues.drop(2).listIterator()
154+
val simple = matches.next()
155+
if (simple != "") { return simple }
156+
val uni4 = matches.next()
157+
if (uni4 != "") {
158+
return uni4.substring(1).toInt(16).toChar().toString()
159+
}
160+
throw Exception("Unexpected")
161+
}
162+
}

0 commit comments

Comments
 (0)