Skip to content

Commit fea082c

Browse files
Merge pull request #61 from zhangzhuang15/dev
Dev
2 parents 16f1f1a + addc765 commit fea082c

File tree

4 files changed

+275
-0
lines changed

4 files changed

+275
-0
lines changed

.vitepress/config.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -376,6 +376,10 @@ export default defineConfig({
376376
text: 'Talk about pthread',
377377
link: '/blog/pthread'
378378
},
379+
{
380+
text: 'How to Write Interpreter',
381+
link: '/blog/interpreter'
382+
},
379383
{
380384
text: "使用vue遇到的一些坑",
381385
link: '/blog/vue-apply'

docs/blog/interpreter.md

Lines changed: 225 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,225 @@
1+
---
2+
title: "How to Write Interpreter"
3+
page: true
4+
aside: true
5+
---
6+
7+
# How to Write Interpreter
8+
I will talk about how to write an interpreter based on 2 open source code. One is [kylin-go](https://github.com/zmh-program/kylin-go), the other is [picol](https://github.com/antirez/picol). Kylin-go is written by Go. Picol is written by C.
9+
10+
I recommend you to read **kylin-go** first, it's more simple, self-evident and understandable, especially for beginner.
11+
12+
## Overview
13+
It's not easy to write an interpreter. A mature interpreter consistes of techs, such as GC(garbage collection), JIT. For beginner, these techs are not important and necessary. We should look at simple theory first.
14+
15+
First, you should have grammar rules.
16+
17+
Second, transform source code to tokens, following grammar rules. This is what lexer does.
18+
19+
Third, transform tokens to execution sequences. This is what parser does.
20+
21+
Finally, prepare scopes, run execution sequences. This is what runtime does.
22+
23+
Ok, let's dive into these parts.
24+
25+
## Lexer
26+
Lexer transforms source code to tokens, for example:
27+
```js
28+
let age = 0;
29+
function hello() {
30+
console.log("hi")
31+
}
32+
hello();
33+
```
34+
35+
After lexer processes, we get tokens like:
36+
```js
37+
tokens = [
38+
{ type: 'keyword', value: 'let' },
39+
{ type: 'identifier', value: 'age' },
40+
{ type: 'operator', value: '=' },
41+
{ type: 'number', value: '0' },
42+
{ type: 'semicolon', value: ';'},
43+
{ type: 'keyword', value: 'function' },
44+
{ type: 'identifier', value: 'hello' },
45+
{ type: 'left-bracket', value: '('},
46+
{ type: 'right-bracket', value: ')' },
47+
{ type: 'left-brace', value: '{' },
48+
{ type: 'identifier', value: 'console' },
49+
{ type: 'dot', value: '.' },
50+
{ type: 'identifier', value: 'log' },
51+
{ type: 'left-bracket', value: '(' },
52+
{ type: 'right-bracket', value: ')' },
53+
{ type: 'left-double-quotation', value: '"' },
54+
{ type: 'identifier', value: 'hl' },
55+
{ type: 'right-double-quotation', value: '"' },
56+
{ type: 'right-brace', value: '}' }
57+
{ type: 'identifier', value: 'hello' },
58+
{ type: 'left-bracket', value: '(' },
59+
{ type: 'right-bracket', value: ')' },
60+
]
61+
```
62+
63+
Lexer cares about word, it doesn't care about relationship of words, as a result, it produces tokens word by word.
64+
65+
## Parser
66+
Parser transforms tokens to execution sequences.
67+
68+
Parser finds keyword `let` and operator `=`, then it creates assignment execution sequence like:
69+
```js
70+
executionSequence = {
71+
type: 'assignment',
72+
variableName: 'age',
73+
value: 0
74+
}
75+
```
76+
77+
Parser finds next token is semicolon, just skip it.
78+
79+
Parser finds next token is keyword `function`, so it creates function definition execution sequence:
80+
```js
81+
executionSequence = {
82+
type: 'function-definition',
83+
functionName: 'hello',
84+
args: [],
85+
body: [
86+
{
87+
type: 'method-call',
88+
obj: 'console',
89+
methodPath: ['log'],
90+
args: [
91+
{ type: 'literal', value: 'hi'}
92+
]
93+
}
94+
]
95+
}
96+
```
97+
98+
As a result, we get these sequences:
99+
```js
100+
sequences = [
101+
{
102+
type: 'assignment',
103+
variableName: 'age',
104+
value: 0
105+
},
106+
{
107+
type: 'function-definition',
108+
functionName: 'hello',
109+
args: [],
110+
body: [
111+
{
112+
type: 'method-call',
113+
objName: 'console',
114+
methodPath: ['log'],
115+
args: [
116+
{ type: 'literal', value: 'hi'}
117+
]
118+
}
119+
]
120+
},
121+
{
122+
type: 'function-call',
123+
functionName: 'hello',
124+
args: [],
125+
}
126+
]
127+
```
128+
129+
## Runtime
130+
Runtime prepared scope, and executes sequences.
131+
132+
We have got sequences:
133+
```js
134+
sequences = [
135+
{
136+
type: 'assignment',
137+
variableName: 'age',
138+
value: 0
139+
},
140+
{
141+
type: 'function-definition',
142+
functionName: 'hello',
143+
args: [],
144+
body: [
145+
{
146+
type: 'method-call',
147+
objName: 'console',
148+
methodPath: ['log'],
149+
args: [
150+
{ type: 'literal', value: 'hi'}
151+
]
152+
}
153+
]
154+
}
155+
]
156+
```
157+
But there're some questions we should answer:
158+
1. where is `age`
159+
2. where is `console`
160+
161+
yes, this is what runtime does, it will prepare scope:
162+
```js
163+
globalScope = { parent: null }
164+
```
165+
166+
Runtime executes first sequence:
167+
```js
168+
globalScope = { parent: null, age: 0 }
169+
```
170+
171+
Runtime executes second sequnce:
172+
```js
173+
globalScope = {
174+
parent: null,
175+
age: 0,
176+
hello: {
177+
args: [],
178+
body: [
179+
{
180+
type: 'method-call',
181+
objName: 'console',
182+
methodPath: ['log'],
183+
args: [
184+
{ type: 'literal', value: 'hi'}
185+
]
186+
}
187+
]
188+
}
189+
}
190+
```
191+
192+
Runtime executes last sequence:
193+
```js
194+
scope = { parent: globalScope }
195+
196+
if (sequence.type === 'method-call') {
197+
// find obj following parent link of scope
198+
const obj = scope[sequence.objName] || scope.parent[sequence.objName]
199+
const method = sequence.methodPath.reduce((prev, state) => prev[state], obj)
200+
const args = sequence.args.map(arg => {
201+
if (arg === 'literal') return arg.value;
202+
if (arg === 'method-call') {
203+
const temp_scope = { parent: scope }
204+
// create another scope, execute recursively
205+
}
206+
})
207+
scope.result = method(...args)
208+
}
209+
210+
scope.parent = null;
211+
```
212+
213+
Now, you have learned that how runtime works. Runtime scope is basically data of native language which is used to write interpreter, e.g. c struct, c++ class, go struct, rust struct, swift class, zig struct and etc.
214+
215+
## Difficulty
216+
I introduce how to write simple interpreter, interpreter is actually more sophisticated.
217+
218+
For example, how to deal with lifetime of `scope`, yes, this is topic of GC.
219+
220+
221+
## How to Speed up
222+
We use native language to write an interpreter, e.g. c/c++/go/rust. We can transform source code to tokens, transform tokens to execution sequences, and save execution sequences in disk. In short words, we call preprocessing **compile**. Then, we just make runtime execute compiled file. Execution of source code gets faster than uncompiled way.
223+
224+
There's another way to speed up. When compiled file is executed, functions defined in source code are transformed to native language function in the end. We can cache these functions. Next time we execute same function, no transform is taken, just take out function from cache, and execute directly. In this way, it gets faster too.
225+

docs/go/FAQ.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -203,6 +203,49 @@ func main() {
203203
}
204204
```
205205

206+
再看一个例子:
207+
```go
208+
package main
209+
210+
import (
211+
"encoding/json"
212+
"fmt"
213+
)
214+
215+
type Cmd struct {
216+
Name string
217+
Value interface{}
218+
}
219+
220+
type Data struct {
221+
Name string
222+
Val []int
223+
}
224+
225+
func main() {
226+
cmd := Cmd{
227+
Name: "hello",
228+
Value: Data{
229+
Name: "world",
230+
Val: []int{20, 30},
231+
},
232+
}
233+
234+
if bytes, err := json.Marshal(cmd); err == nil {
235+
fmt.Println(string(bytes))
236+
237+
var value Cmd
238+
if err = json.Unmarshal(bytes, &value); err == nil {
239+
// 尽管 Cmd.Value是 interface{}, 但在反序列化的时候,
240+
// 会转化为 map 类型,而不是Data类型,这种动态处理虽然
241+
// unsafe,但很棒
242+
fmt.Println(value.Value.(map[string]any).Val)
243+
}
244+
}
245+
}
246+
247+
```
248+
206249
## 继承方式
207250
go采用了一种组合的方式完成继承机制。
208251

docs/tool/npm-package-cmd.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,4 +88,7 @@ you can use pkg-dir. Project root is the directory including `package.json`.
8888

8989
As you can imagine, pkg-dir looks for the first `package.json` from current directory to parent directory.
9090

91+
## tabtab
92+
A library helps you complete your input with pressing TAB key.
93+
9194
<Giscus />

0 commit comments

Comments
 (0)