-
-
Notifications
You must be signed in to change notification settings - Fork 15
Compiler internals
The following steps are executed in order:
The source code is translated into a list of tokens.
This is where some sequences of characters are understood as tokens (for example -
is the Minus token but ->
is the Arrow token).
This is also where literals (integers/floats/strings), identifiers and comments are handled.
Each token is linked to a source filepath and starting line/column along with an ending line/column, to be able to point at a source code position if errors are found, at any compilation stage.
For example let foo = 42;
becomes Let Identifier(foo) Equal IntegerValue(42) Semicolon
.
Note that as this steps, types aren't known. let x: vec3[f32];
is Let Identifier(x) Colon Identifier(vec3) OpenSquareBracket Identifier(f32) ClosingSquareBracket
.
The parser takes a token list and outputs an Abstract Syntax Tree.
This is where tokens are associated together to form nodes, for example with Let Identifier(foo) Equal IntegerValue(42) Semicolon
(let foo = 42;
):
With the token sequence Let Identifier(foo) Equal IntegerValue(42) Semicolon
(let foo = 42;
):
- The parser finds the
Let
token and thus knows it has to parse a variable declaration. - The parser then expects an
Identifier
token giving the variable its name. - The parser knows it may encounter a
Colon
token, if so it consumes it and then parses the variable type. - Afterwards the parser expects an
Assign
(=) token (which is mandatory except if a variable type has been provided) with the variable expression. - It finally expects a
Semicolon
to end the variable declaration.
The result is an AST node: DeclareVariableStatement(name:"foo", type: none, expression: ConstantValue(42))
.
Of course this is a simplified example as variables are not expected outside of functions, which are statement of their own.
Types and expressions are processed the same way at this point, for example:
-
arr[idx]
, "Access theidx
entry of thearr
array", isAccessIdentifierExpression(expression: IdentifierExpression("arr"), identifier: "idx")
-
vec3[f32]
, "vector of three floating-point type", isAccessIdentifierExpression(expression: IdentifierExpression("vec3"), identifier: "f32")
Types will be resolved in the next step.
The AST from the parser will undergo a few transformations in multiple passes.
This is the most important pass, it's responsible for resolving identifiers, types and imports.
It's also responsible for removing aliases and handling loop unrolling.
One of the most important thing it does is to give a unique id to every variable/struct/function/etc. allowing to resolve identifiers to simple numerical id later.
For example:
let x: i32 = 42;
let y = x + 1;
becomes after parsing:
DeclareVariableStatement(name: "x", type: Identifier("i32"), value: ConstantValue(42))
DeclareVariableStatement(name: "y", value: BinaryExpression(op: Add, lhs: Identifier("x"), rhs: ConstantValue(1)))
Resolve pass will roughly transform this into:
0 = DeclareVariableStatement(name: "x", type: PrimitiveType::Int32, value: ConstantValue(42))
1 = DeclareVariableStatement(name: "y", type: PrimitiveType::Int32, value: BinaryExpression(op: Add, lhs: VariableValueExpression(0), rhs: ConstantValue(1)))
Notice how y
type was infered from it's initial value.
The resolve pass is also responsible to give each expression the right type depending on what it's doing.
This pass tries to validate the correctness of the whole AST.
This pass doesn't change the AST and is optional, it's recommanded to have it to catch errors.
Note that it doesn't re-validate what's validated by resolve and binding resolver passes for performance reasons.
This pass resolves the bindings on external blocks.
This pass splits branches with multiple conditions (else if) to multiple branches.
[entry(frag)]
fn main()
{
let value: f32;
if (data.value > 3.0)
value = 3.0;
else if (data.value > 2.0)
value = 2.0;
else if (data.value > 1.0)
value = 1.0;
else
value = 0.0;
}
=>
[entry(frag)]
fn main()
{
let value: f32;
if (data.value > (3.0))
{
value = 3.0;
}
else
{
if (data.value > (2.0))
{
value = 2.0;
}
else
{
if (data.value > (1.0))
{
value = 1.0;
}
else
{
value = 0.0;
}
}
}
}
This pass removes compound assignment and turns them to variable assignments.
fn main()
{
let x = 1;
let y = 2;
x += y;
x += 1;
}
=>
fn main()
{
let x = 1;
let y = 2;
x = x + y;
x = x + 1;
}
This pass is used when resolving constants but can also be applied to the whole shader module to optimize what it can:
Example:
[entry(frag)]
fn main()
{
let output = 8.0 * (7.0 + 5.0) * 2.0 / 4.0 - 6.0 % 7.0;
let output2 = 8 * (7 + 5) * 2 / 4 - 6 % 7;
let output3 = f64(8.0) * (f64(7.0) + f64(5.0)) * f64(2.0) / f64(4.0) - f64(6.0) % f64(7.0);
let output4 = u32(8) * (u32(7) + u32(5)) * u32(2) / u32(4) - u32(6) % u32(7);
}
=>
[entry(frag)]
fn main()
{
let output: f32 = 42.0;
let output2: i32 = 42;
let output3: f64 = f64(42.0);
let output4: u32 = u32(42);
}
It can also remove branches where the condition is known to be either true or false at compilation:
[entry(frag)]
fn main()
{
let output = 0.0;
if (5 <= 3)
output = 5.0;
else if (4 <= 3)
output = 4.0;
else if (3 <= 3)
output = 3.0;
else if (2 <= 3)
output = 2.0;
else if (1 <= 3)
output = 1.0;
else
output = 0.0;
}
=>
[entry(frag)]
fn main()
{
let output: f32 = 0.0;
output = 3.0;
}
This pass removes constant and options declaration (and constant/option expression) with their values, to simplify GLSL/SPIR-V generation.
This pass removes code that isn't used in any way.
This is used internally when resolving modules import to only import relevant code, but it can also be applied to the whole module.
Example:
[nzsl_version("1.0")]
module;
struct inputStruct
{
value: vec4[f32]
}
struct notUsed
{
value: vec4[f32]
}
external
{
[set(0), binding(0)] unusedData: uniform[notUsed],
[set(0), binding(1)] data: uniform[inputStruct]
}
fn unusedFunction() -> vec4[f32]
{
return unusedData.value;
}
struct Output
{
value: vec4[f32]
}
[entry(frag)]
fn main() -> Output
{
let unusedvalue = unusedFunction();
let output: Output;
output.value = data.value;
return output;
})
=>
[nzsl_version("1.0")]
module;
struct inputStruct
{
value: vec4[f32]
}
external
{
[set(0), binding(1)] data: uniform[inputStruct]
}
struct Output
{
value: vec4[f32]
}
[entry(frag)]
fn main() -> Output
{
let output: Output;
output.value = data.value;
return output;
})
This passes replaces for and for each statements to while
For
[entry(frag)]
fn main()
{
let x = 0.0;
for i in 0 -> 10
{
x += data.value[i];
}
}
=>
[entry(frag)]
fn main()
{
let x: f32 = 0.0;
{
let i: i32 = 0;
let _nzsl_to: i32 = 10;
while (i < _nzsl_to)
{
x += data.value[i];
i += 1;
}
}
}
For each
[entry(frag)]
fn main()
{
let x: f32 = 0.0;
for v in data.value
{
x += v;
}
}
=>
[entry(frag)]
fn main()
{
let x: f32 = 0.0;
{
let _nzsl_counter: u32 = u32(0);
while (_nzsl_counter < (u32(10)))
{
let v: f32 = data.value[_nzsl_counter];
x += v;
_nzsl_counter += u32(1);
}
}
}
This pass renames identifiers to avoid forbidden names (especially for GLSL)
This pass replaces matrix casting and matrix additions/subtractions with component-wise matrix constructions. This is mostly helpful to generate SPIR-V as it cannot do that easily.
Matrix addition:
fn testMat4PlusMat4(x: mat4[f32], y: mat4[f32]) -> mat4[f32]
{
return x + y;
}
=>
fn testMat4PlusMat4(x: mat4[f32], y: mat4[f32]) -> mat4[f32]
{
return mat4[f32](x[u32(0)] + y[u32(0)], x[u32(1)] + y[u32(1)], x[u32(2)] + y[u32(2)], x[u32(3)] + y[u32(3)]);
}
Matrix casting:
fn testMat3ToMat4(input: mat3[f32]) -> mat4[f32]
{
return mat4[f32](input);
}
=>
fn testMat3ToMat4(input: mat3[f32]) -> mat4[f32]
{
let _nzsl_matrix: mat4[f32];
_nzsl_matrix[u32(0)] = vec4[f32](input[u32(0)], 0.0);
_nzsl_matrix[u32(1)] = vec4[f32](input[u32(1)], 0.0);
_nzsl_matrix[u32(2)] = vec4[f32](input[u32(2)], 0.0);
_nzsl_matrix[u32(3)] = vec4[f32](input[u32(3)], 1.0);
return _nzsl_matrix;
}
This pass splits array/struct fields assignment for wrapped (uniform/storages) types, as it's forbidden in both GLSL and SPIR-V.
This pass removes scalar swizzle (a.xx
) and replaces it with a cast. This is helpful for GLSL as it doesn't support scalar swizzle.
Example:
fn expr() -> i32
{
return 1.0;
}
fn main()
{
let value = 42.0;
let x = value.r;
let y = value.xxxx;
let z = expr().xxx;
}
=>
fn expr() -> i32
{
return 1.0;
}
fn main()
{
let value: f32 = 42.0;
let x: f32 = value;
let y: vec4[f32] = vec4[f32](value, value, value, value);
let _nzsl_cachedResult: i32 = expr();
let z: vec3[i32] = vec3[i32](_nzsl_cachedResult, _nzsl_cachedResult, _nzsl_cachedResult);
}
The processed AST is then given to the final backend, responsible to output GLSL, SPIR-V or even get back to NZSL.
Since NZSL supports and was designed around options, compile-time constants whose value is given by the application (to support uber/specialized shaders), the compiler supports partial compilation.
Partial compilation is a special compiler mode where it will resolve everything it can but identify and leave unresolvable code for a later pass.