Skip to content

Commit 00e57df

Browse files
committed
feat: better asm parsing according to upstream Tact compiler parser
Additionally, a lot of choices were made to better accomodate the future Tact assembly grammar (not in syntax, but in node names and layout) and facilitate language server use Closes #49
1 parent 717e96d commit 00e57df

File tree

9 files changed

+7509
-7342
lines changed

9 files changed

+7509
-7342
lines changed

.github/workflows/ci.yml

+2
Original file line numberDiff line numberDiff line change
@@ -47,4 +47,6 @@ jobs:
4747
if: ${{ runner.os != 'Windows' }}
4848
run: |
4949
git clone https://github.com/tact-lang/tact.git -b "v$(jq -r '.version' < package.json)"
50+
npm run parse -- -q tact/src/grammar/next/test/*.tact
5051
npm run parse -- -q tact/src/grammar/test/*.tact
52+
# TODO: items-asm-funs.tact must be changed

README.md

+3-4
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,11 @@
66

77
A fully-featured 🌳 [Tree-sitter](https://github.com/tree-sitter/tree-sitter) grammar for the ⚡ Tact contract programming language:
88

9-
- 🍰 Parses whole Tact grammar as defined in [grammar.ohm](https://github.com/tact-lang/tact/blob/main/src/grammar/grammar.ohm) (with performance and usability in mind).
9+
- 🍰 Parses whole Tact grammar as defined in [grammar.gg](https://github.com/tact-lang/tact/blob/da4b8d82128cf4b6f9b04d93a93a9382407112c2/src/grammar/next/grammar.gg) (with performance and usability in mind).
1010
- 🎨 Provides highlighting, scoping and tagging [queries](#-structure).
1111
- ⚙ Test-covered (including queries), reflects latest Tact language updates.
1212
- 🚀 See guidelines on usage and integration in editors supporting Tree-sitter [below](#-usage).
1313

14-
Note, that the only limiting point are the `asm` functions introduced in Tact 1.5.0 — their bodies doesn't produce any highlighting and can be ill-parsed for now, so expect ERROR nodes in the parse tree. In the future, this is planned to be resolved by an external scanner — it can parse much more, and it can yield more tokens for subsequent highlighting.
15-
1614
## 🚀 Usage
1715

1816
### Neovim
@@ -205,7 +203,8 @@ To find highlighting and other queries for specific editors, look in the `editor
205203

206204
## ⚙ References
207205

208-
- [grammar.ohm](https://github.com/tact-lang/tact/blob/main/src/grammar/grammar.ohm) — Official grammar specification in Ohm PEG language.
206+
- [grammar.gg](https://github.com/tact-lang/tact/blob/da4b8d82128cf4b6f9b04d93a93a9382407112c2/src/grammar/next/grammar.gg) — Official Tact grammar specification.
207+
- [grammar.ohm](https://github.com/tact-lang/tact/blob/da4b8d82128cf4b6f9b04d93a93a9382407112c2/src/grammar/prev/grammar.ohm) — Previous, now outdated Tact grammar specification in Ohm PEG language.
209208
- [tact-by-example](https://github.com/tact-lang/tact-by-example) — Many different contract samples.
210209

211210
## Useful ⚡ Tact links

grammar.js

+88-28
Original file line numberDiff line numberDiff line change
@@ -218,47 +218,107 @@ module.exports = grammar({
218218
asm_arrangement_rets: ($) =>
219219
seq("->", repeat1(alias($._decimal_integer, $.integer))),
220220

221-
asm_function_body: ($) =>
222-
seq(
223-
"{",
224-
prec.right(
225-
repeat(
226-
choice(
227-
// list with { }
228-
$.asm_list,
229-
// others
230-
$._asm_instruction,
231-
),
232-
),
221+
// NOTE:
222+
// The following asm-related pieces intentionally differ from the grammar.gg.
223+
// This is done to provide a better API for the language server.
224+
//
225+
// There's no catch because there's no well-defined Tact assembly syntax
226+
// that we've agreed upon — the current parser in the compiler
227+
// simply produces a large string to be passed as-is to the rest of the pipeline.
228+
//
229+
// Therefore, most of the things below would be internally refactored and/or removed completely once there's a proper definition of the Tact assembly.
230+
// (That does NOT require Fift removal, a first step might be just
231+
// converting our syntax to bits of the Fift syntax seen below)
232+
//
233+
asm_function_body: ($) => seq("{", repeat($.asm_expression), "}"),
234+
235+
// Zero or more arguments, followed by a TVM instruction
236+
asm_expression: ($) =>
237+
prec.right(
238+
seq(
239+
field("arguments", optional($.asm_argument_list)),
240+
field("name", $.tvm_instruction),
233241
),
234-
prec.right("}"),
235242
),
236243

237-
asm_list: ($) => seq("{", /\s/, repeat($._asm_instruction), "}", /\s/),
244+
// One or more primitives
245+
asm_argument_list: ($) => repeat1($._asm_primitive),
238246

239-
_asm_instruction: ($) =>
247+
// See comments for each
248+
_asm_primitive: ($) =>
240249
choice(
241-
// string
242-
$._asm_string,
243-
// char
244-
seq("char", /\s/, /\S/, /\s/),
245-
// custom
246-
/\S+/, // NOTE: this point can be significantly improved
250+
$.asm_sequence,
251+
$.asm_string,
252+
$.asm_hex_bitstring,
253+
$.asm_bin_bitstring,
254+
$.asm_boc_hex,
255+
$.asm_control_register,
256+
$.asm_stack_register,
257+
$.asm_integer,
247258
),
248259

249-
_asm_string: (_) =>
260+
// <{ ... }>
261+
asm_sequence: ($) =>
262+
seq("<{", repeat($.asm_expression), choice("}>CONT", "}>")),
263+
264+
// "..."
265+
asm_string: (_) =>
250266
seq(
251267
choice('abort"', '."', '+"', '"'),
252268
token.immediate(prec(1, /[^"]+/)),
253269
token.immediate('"'),
254-
/\s/,
255270
),
256271

257-
// NOTE: May be re-introduced in the future, unused in the current parser
258-
// listNoStateCheck
259-
// seq("({)", /\s/, repeat($._asm_instruction), "(})", /\s/),
260-
// hexLiteral
261-
// _asm_hex_literal: (_) => /[xB]\{[\s\da-fA-F]*_?\s*\}\s/,
272+
// x{DEADBEEF_}
273+
// x{babecafe}
274+
// x{}
275+
asm_hex_bitstring: (_) => /x\{[a-fA-F0-9]*_?\}/,
276+
277+
// b{011101010}
278+
// b{}
279+
asm_bin_bitstring: (_) => /b\{[01]*\}/,
280+
281+
// B{DEADBEEF_} B>boc
282+
// B{babecafe} B>boc
283+
// B{} B>boc
284+
// <b b>
285+
asm_boc_hex: (_) => choice(/B\{[a-fA-F0-9]*_?\}\s+B>boc/, /<b\s+b>/),
286+
287+
// c0
288+
// c15
289+
asm_control_register: (_) => /c\d\d?/,
290+
291+
// s0
292+
// s15
293+
// 16 s()
294+
asm_stack_register: (_) => choice(/s\d\d?/, /\d\d?\d?\s+s\(\)/),
295+
296+
// 0
297+
// 500
298+
// -42
299+
// 0b10
300+
// 0xff
301+
// 0xFF
302+
asm_integer: (_) => {
303+
const hex_literal = /-?0x[a-fA-F0-9]+/;
304+
const bin_literal = /-?0b[01]+/;
305+
const dec_literal = /-?\d+/;
306+
307+
return token(choice(
308+
hex_literal, // hexadecimal
309+
bin_literal, // binary
310+
dec_literal, // decimal
311+
));
312+
},
313+
314+
// MYCODE
315+
// HASHEXT_SHA256
316+
// ADDRSHIFTMOD ADDRSHIFT#MOD
317+
// IF IF:
318+
// XCHG3 XCHG3_l
319+
// 2SWAP SWAP2
320+
// ROT -ROT
321+
tvm_instruction: (_) => /-?[A-Z0-9_#:]+l?/,
262322

263323
/* Functions */
264324

0 commit comments

Comments
 (0)