Skip to content

Commit dcc85f1

Browse files
committed
rewrite grammar, remove scanner
Problem: Hand-written C scanner is hard to maintain, slow, and hangs on files like `filetype.txt` and `usr_24.txt`. Solution: Delete hand-written C scanner, define grammar fully in `grammar.js` - introduce `url` - introduce `block`, a group of lines. (does not support nesting yet) - introduce `line_li` for listitems. (does not support nesting yet) - keycodes #1 - `[range]` #1 fix #1 fix #7 fix #9 fix #10 fix #11 fix #14 fix #12 (except nested) fix #13 (except nested)
1 parent d1900d9 commit dcc85f1

14 files changed

+994
-376
lines changed

README.md

+45
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
tree-sitter-vimdoc
2+
==================
3+
4+
This grammar intentionally support a subset of the vimdoc "spec"; predictable
5+
results are the primary goal, so that _output_ formats (e.g. HTML) are
6+
well-formed; the _input_ (vimdoc) is secondary. The first step should always be
7+
to try to fix the input (within reason) rather than insist on a grammar that
8+
handles vimdoc's endless quirks.
9+
10+
Notes
11+
-----
12+
13+
- vimdoc format "spec":
14+
- [:help help-writing](https://neovim.io/doc/user/helphelp.html#help-writing)
15+
- https://github.com/nanotee/vimdoc-notes
16+
- `(code_block)` is contained by `(line)` because `>` can start a code block at the end of a line.
17+
18+
Known issues
19+
------------
20+
21+
- `line` in a `code_block` does not contain `word` atoms, it's just the full
22+
raw text line including whitespace. This is somewhat dictated by its
23+
"preformatted" nature; parsing the contents implies loading a "child"
24+
language (injection). See [#2](https://github.com/vigoux/tree-sitter-vimdoc/issues/2).
25+
- `url` doesn't handle _surrounding_ parens. E.g. `(https://example.com/#yay)` yields `word`
26+
- `url` doesn't handle _nested_ parens. E.g. `(https://example.com/(foo)#yay)`
27+
- Ideally `block_end` should consume the last block of the document _only_ if that
28+
block is missing a trailing blank line or EOL ("\n").
29+
- TODO: consider simply _not supporting_ docs without EOL?
30+
- Ideally `line_noeol` should consume the last line of the document _only_ if
31+
that line is missing EOL ("\n").
32+
- TODO: consider simply _not supporting_ docs without EOL?
33+
34+
TODO
35+
----
36+
37+
- `line_noeol` is a special-case to support documents that don't end in EOL.
38+
Grammar could be a bit simpler if we just require EOL at end of document.
39+
- `line_modeline` (only at EOF)
40+
- `column_heading` should not allow hotlinks. This is sometimes used in old help files to show results of a code example, e.g. in `usr_41.txt`:
41+
```
42+
List concatenation is done with +: >
43+
:echo alist + ['foo', 'bar']
44+
< ['foo', 'bar', 'foo', 'bar'] ~
45+
```

corpus/arguments.txt

+37-16
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,52 @@
11
================================================================================
2-
Simple argument
2+
simple argument
33
================================================================================
44
This in an argument: {arg}
55
--------------------------------------------------------------------------------
66

77
(help_file
8-
(line
9-
(word)
10-
(word)
11-
(word)
12-
(word)
13-
(argument
14-
(word))))
8+
(block
9+
(line
10+
(word)
11+
(word)
12+
(word)
13+
(word)
14+
(argument
15+
(word)))))
1516

1617
================================================================================
17-
Multiple arguments on the same line
18+
multiple arguments on the same line
1819
================================================================================
19-
2020
{foo} {bar} {baz}
2121

2222
--------------------------------------------------------------------------------
2323

2424
(help_file
25-
(line
26-
(argument
27-
(word))
28-
(argument
29-
(word))
30-
(argument
25+
(block
26+
(line
27+
(argument
28+
(word))
29+
(argument
30+
(word))
31+
(argument
32+
(word)))))
33+
34+
================================================================================
35+
NOT an argument
36+
================================================================================
37+
{foo "{bar}" `{baz}` |{baz| }
38+
39+
--------------------------------------------------------------------------------
40+
41+
(help_file
42+
(block
43+
(line
44+
(argument
45+
(word)
46+
(ERROR))
47+
(word)
48+
(backtick
49+
(word))
50+
(hotlink
51+
(word))
3152
(word))))

corpus/backtick.txt

+70-23
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,92 @@
11
================================================================================
2-
Simple backtick
2+
simple backtick
33
================================================================================
44

5-
`foobar`
5+
a `foobar` b `:echo`
66

77
--------------------------------------------------------------------------------
88

99
(help_file
10-
(line
11-
(backtick
12-
(word))))
10+
(block
11+
(line
12+
(word)
13+
(backtick
14+
(word))
15+
(word)
16+
(backtick
17+
(word)))))
1318

1419
================================================================================
15-
Backtick in text
20+
backtick in text
1621
================================================================================
1722

18-
Hello `world`, I am a markup language
23+
Hello `world`, I am `markup language`. But `this is
24+
an error`.
1925

2026
--------------------------------------------------------------------------------
2127

2228
(help_file
23-
(line
24-
(word)
25-
(backtick
26-
(word))
27-
(word)
28-
(word)
29-
(word)
30-
(word)
31-
(word)
32-
(word)))
29+
(block
30+
(line
31+
(word)
32+
(backtick
33+
(word))
34+
(word)
35+
(word)
36+
(word)
37+
(backtick
38+
(word))
39+
(word)
40+
(word)
41+
(backtick
42+
(word)
43+
(MISSING "`")))
44+
(line
45+
(word)
46+
(word))))
3347

3448
================================================================================
35-
Backtick with command inside
49+
NOT a codespan / backtick
3650
================================================================================
37-
38-
`:echo`
51+
*'* *'a* *`* *`a*
52+
'{a-z} `{a-z} Jump to the mark.
53+
*g'* *g'a* *g`* *g`a*
54+
g'{mark} g`{mark}
3955

4056
--------------------------------------------------------------------------------
4157

4258
(help_file
43-
(line
44-
(backtick
45-
(word))))
59+
(block
60+
(line
61+
(tag
62+
(word))
63+
(tag
64+
(word))
65+
(tag
66+
(word))
67+
(tag
68+
(word)))
69+
(ERROR)
70+
(line
71+
(argument
72+
(word))
73+
(word)
74+
(word)
75+
(word)
76+
(word))
77+
(line
78+
(tag
79+
(word))
80+
(tag
81+
(word))
82+
(tag
83+
(word))
84+
(tag
85+
(word)))
86+
(line
87+
(word)
88+
(argument
89+
(word))
90+
(word)
91+
(argument
92+
(word)))))

0 commit comments

Comments
 (0)