Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZON #20271

Open
wants to merge 54 commits into
base: master
Choose a base branch
from
Open

ZON #20271

wants to merge 54 commits into from

Conversation

MasonRemaley
Copy link
Contributor

@MasonRemaley MasonRemaley commented Jun 12, 2024

ZON

This PR implements ZON, or Zig Object Notation (think JSON, but with Zig syntax instead of Javascript.)

In particular, it implements:

  • Runtime parsing of ZON data
  • Runtime serialization of ZON data
  • Compile time importing of ZON data

A lot of files are added since I added a lot of test cases, the bulk of the work is in src/zon.zig and lib/std/zon. If there's anything I can do to make this easier to review, let me know.

Tests cover all new features.

Runtime

The runtime code can be found at lib/std/zon.

Parsing

lib/std/zon/parse.zig

Detailed doc comments are provided in the source. At a high level, this module provides the following functions and some configuration options:

  • std.zon.parse.fromSlice
  • std.zon.parse.fromZoir
  • std.zon.parse.fromZoirNode
  • std.zon.parse.free

Most people will just want std.zon.parse.fromSlice and maybe std.zon.parse.free, but the more fine grained functions are available for those who need them.

Stringify

lib/std/zon/stringify.zig

Detailed doc comments are provided in the source. At a high level, this module provides the following functions and some configuration options:

  • std.zon.stringify.serialize
  • std.zon.stringify.serializeMaxDepth
  • std.zon.stringify.serializeArbitraryDepth
  • std.zon.stringify.serializer

Most users will just need serialize, or one of its variants.

However, std.zon.stringify.serializer returns a more fine grained interface that is used in the implementation of the other functions. It may be used directly if you need to serialize something piece by piece--perhaps to apply different configuration to different parts of the value, or because the value does not actually exist laid out in memory in the same form in which you want to stringify it.

Compile Time

See src/zon.zig.

This PR implements importing ZON at compile time:

const Foo = @import("foo.zon");

Things that may change later...

Untagged nonexhaustive enum values not yet supported

Zig does not currently have a way to represent an untagged enum value as a literal, and as such there's no way to represent one in ZON.

We could resolve this by adding a syntax for untagged enum literals to Zig, or by allowing ZON to coerce integers to untagged enum literals (provided the destination type is known.) Personally I prefer the former solution.

  • After this PR is merged I'll file an issue to track this if still relevant.

Allowing eliding outer braces may make sense

There's case to be made to allow eliding braces on an outer struct in ZON, like this:

.x = 10,
.y = 20,
  • I'll file an issue with some motivating use cases after this is merged, easy change if we decide to make it

Divergences from Zig

Zig has no NaN or inf literals right now. Zon does, since otherwise it would have no way to represent these values.

IMO we should add the same literals to Zig, but there are good arguments not to as well, so I'm not trying to address that here.

  • After this PR is merged I'll file an issue to track this.

See Also

@MasonRemaley MasonRemaley mentioned this pull request Jun 12, 2024
src/Module.zig Outdated Show resolved Hide resolved
src/Module.zig Outdated Show resolved Hide resolved
Copy link
Member

@mlugg mlugg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding & in ZON...

You're correct that ZON definitely shouldn't include this. Our main use of ZON today (build.zig.zon) already creates variable-length structures without this syntax. To avoid a divergence between ZON you can @import and build.zig.zon, I would consider this a merge blocker, but that's Andrew's call of course.

Implementation-wise, the ZIR instruction corresponding to @import should be modified to provide a result type if one is present (you can use .none to represent the case where there isn't one). Then, the ZON analysis logic should traverse into this result type as needed. This has the additional advantage of providing better source locations if the structure of an imported file doesn't match what's expected, since the type error will occur whilst parsing the relevant ZON expression.

src/Module.zig Outdated Show resolved Hide resolved
src/Module.zig Outdated Show resolved Hide resolved
src/Module.zig Outdated Show resolved Hide resolved
src/Module.zig Outdated Show resolved Hide resolved
src/Module.zig Outdated Show resolved Hide resolved
src/Module.zig Outdated Show resolved Hide resolved
@MasonRemaley
Copy link
Contributor Author

MasonRemaley commented Jun 12, 2024

Regarding & in ZON...

You're correct that ZON definitely shouldn't include this. Our main use of ZON today (build.zig.zon) already creates variable-length structures without this syntax. To avoid a divergence between ZON you can @import and build.zig.zon, I would consider this a merge blocker, but that's Andrew's call of course.

Implementation-wise, the ZIR instruction corresponding to @import should be modified to provide a result type if one is present (you can use .none to represent the case where there isn't one). Then, the ZON analysis logic should traverse into this result type as needed. This has the additional advantage of providing better source locations if the structure of an imported file doesn't match what's expected, since the type error will occur whilst parsing the relevant ZON expression.

Hmm I think I agree, better to knock it out now than to release it as is then immediately change the syntax. I would've done this up front but incorrectly thought it required major changes to the compiler.

I'll take a look at making this change. If no result type is specified, I'll have it default to a tuple rather than a slice unless there are any objections to that default.

@mlugg
Copy link
Member

mlugg commented Jun 12, 2024

Yup. I agree that it makes sense to use a tuple when there's no result type.

By the way, in case you're unfamiliar with the AstGen code, you'll just want try rl.ri.resultType(gz, node) orelse .none in the corresponding case in AstGen.builtinCall.

tiehuis added a commit to tiehuis/zig that referenced this pull request Jun 14, 2024
There were two primary issues at play here:
 1. The hex float prefix was not handled correctly when the stream was
    reset for the fallback parsing path, which occured when the mantissa was
    longer max mantissa digits.
 2. The implied exponent was not adjusted for hex-floats in this branch.

Additionally, some of the float parsing routines have been condensed, making
use of comptime.

closes ziglang#20271
tiehuis added a commit to tiehuis/zig that referenced this pull request Jun 14, 2024
There were two primary issues at play here:
 1. The hex float prefix was not handled correctly when the stream was
    reset for the fallback parsing path, which occured when the mantissa was
    longer max mantissa digits.
 2. The implied exponent was not adjusted for hex-floats in this branch.

Additionally, some of the float parsing routines have been condensed, making
use of comptime.

closes ziglang#20271
@andrewrk
Copy link
Member

andrewrk commented Jun 16, 2024

+ echo Looking for non-conforming code formatting...
+ stage3-debug/bin/zig fmt --check .. --exclude ../test/cases/ --exclude ../doc/ --exclude ../build-debug
Looking for non-conforming code formatting...
../test/behavior/zon/a_neg.zon:1:1: error: expected type expression, found '-'
../test/behavior/zon/slice-1.zon:1:1: error: expected type expression, found '&'
../test/behavior/zon/void.zon:1:1: error: expected type expression, found '{'
../test/behavior/zon/slice-empty.zon:1:1: error: expected type expression, found '&'
../test/behavior/zon/false.zon
../test/behavior/zon/slice-abc.zon:1:1: error: expected type expression, found '&'

In case you haven't found why this is happening, it's here:

# TODO: move this to a build.zig step (check-fmt)
echo "Looking for non-conforming code formatting..."
stage3-release/bin/zig fmt --check .. \
--exclude ../test/cases/ \
--exclude ../doc/ \
--exclude ../build-debug \
--exclude ../build-release

Edit: I'll address this in a PR shortly. Sit tight for a bit.

@andrewrk
Copy link
Member

andrewrk commented Jun 17, 2024

After #20321 you should see the same failures locally when running zig build test, and you can check only code formatting conformance with zig build test-fmt.

Edit: I think if you rebase and resolve that build.zig conflict, those CI failures will disappear.

@MasonRemaley
Copy link
Contributor Author

MasonRemaley commented Jun 19, 2024

Nice thanks! I'll do the rebase and get started on removing & from the syntax when I'm back in town next week.

lib/std/zon.zig Outdated Show resolved Hide resolved
@MasonRemaley MasonRemaley force-pushed the zon branch 8 times, most recently from 0358ab4 to ce32047 Compare June 26, 2024 19:58
@MasonRemaley
Copy link
Contributor Author

MasonRemaley commented Jun 28, 2024

By the way, in case you're unfamiliar with the AstGen code, you'll just want try rl.ri.resultType(gz, node) orelse .none in the corresponding case in AstGen.builtinCall.

Thanks, I'm not familiar with this code so this is helpful!

I started implementing storing the result type on imports, but got a bit stuck.

AstGen used to use addStrTok(...) to store the import path. Since we now want to store both the import path and a result type, I store a Zir.Inst.Bin Heres's what AstGen does now:

// AstGen.builtinCall
const operand_node = params[0];
const rhs = try expr(gz, scope, .{ .rl = .none }, operand_node);
try gz.addPlNode(.import, node, Zir.Inst.Bin{
    .lhs = result_type,
    .rhs = rhs,
});

Zir used to use .str_tok to get the import path, that's also been updated:

// Zir.zirImport
const inst_data = sema.code.instructions.items(.data)[@intFromEnum(inst)].pl_node;
const extra = sema.code.extraData(Zir.Inst.Bin, inst_data.payload_index).data;
const src = block.nodeOffset(inst_data.src_node);
const operand = try sema.resolveConstString(
    block,
    src,
    extra.rhs,
    .{ .needed_comptime_reason = "import path must be comptime-known" },
);

When this code runs, resolveConstString fails:

thread 7410 panic: index out of bounds: index 123056708, len 16
Analyzing lib/std/std.zig
    > %344 = import(%3306982439, %123057044)
      %345 = break_inline(%343, %344)
    For full context, use the command
      zig ast-check -t lib/std/std.zig

  in lib/std/std.zig
    > %436 = decl_val("start")

/home/mason/Documents/zon/zig/src/Sema.zig:299:25: 0x6884afc in contains (zig)
        return map.items[@intFromEnum(key) - @intFromEnum(map.start)] != .none;
                        ^
/home/mason/Documents/zon/zig/src/Sema.zig:255:26: 0x649170c in get (zig)
        if (!map.contains(key)) return null;
                         ^
/home/mason/Documents/zon/zig/src/Sema.zig:1857:39: 0x6491605 in resolveInst (zig)
        const inst = sema.inst_map.get(i).?;
                                      ^
/home/mason/Documents/zon/zig/src/Sema.zig:1887:42: 0x6d84bcf in resolveConstString (zig)
    const air_inst = try sema.resolveInst(zir_ref);
                                         ^
/home/mason/Documents/zon/zig/src/Sema.zig:13928:48: 0x68ae4b4 in zirImport (zig)
    const operand = try sema.resolveConstString(

It appears that the value I get back in Sema for extra.rhs is garbage, instead of the one that I put in during AstGen.

Any pointers would be much appreciated--I can also push the WIP commit if that would be helpful, but I imagine that I'm probably missing something that will be obvious to people more familiar with this part of the codebase.

@mlugg
Copy link
Member

mlugg commented Jun 28, 2024

You probably just need to clear your cache directory -- ZIR is cached based on zig version, so if you haven't created a commit with your changes, the compiler will use the old cached ZIR!

However, your lowering strategy for import isn't really suitable. Since the operand must be a string literal, it's just a waste of bytes and time to use expr to actually create a str instruction. So, you shouldn't use Zir.Inst.Bin as the payload here -- instead, create a new payload type which stores result_ty: Ref, name: NullTerminatedString, and use that as the payload. (For the union field, pl_node is fine; pl_tok would also be fine.)

@MasonRemaley
Copy link
Contributor Author

You probably just need to clear your cache directory -- ZIR is cached based on zig version, so if you haven't created a commit with your changes, the compiler will use the old cached ZIR!

Ha that was it, thanks! I made the same mistake yesterday and didn't quite understand how I fixed it--that had been bugging me glad to have it resolved.

However, your lowering strategy for import isn't really suitable. Since the operand must be a string literal, it's just a waste of bytes and time to use expr to actually create a str instruction. So, you shouldn't use Zir.Inst.Bin as the payload here -- instead, create a new payload type which stores result_ty: Ref, name: NullTerminatedString, and use that as the payload. (For the union field, pl_node is fine; pl_tok would also be fine.)

Makes sense, will do!

src/print_zir.zig Outdated Show resolved Hide resolved
This was a bit odd, because it meant that only the coercion really needed to be checked, which isn't what I was intending to test.
Copy link
Member

@mlugg mlugg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just covers std.zon.parse, not std.zon.stringify. I also didn't look over the tests yet, although I'm sure they're fine.

Comment on lines +246 to +255
/// The type to deserialize into. May only transitively contain the following supported types:
/// * bools
/// * fixed sized numeric types
/// * enums
/// * slices
/// * arrays
/// * structures
/// * unions
/// * optionals
/// * null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the changes we discussed to what types are accepted, it's more easy to say what's not allowed:

Suggested change
/// The type to deserialize into. May only transitively contain the following supported types:
/// * bools
/// * fixed sized numeric types
/// * enums
/// * slices
/// * arrays
/// * structures
/// * unions
/// * optionals
/// * null
/// The type to deserialize into.
/// May not be or contain any of the following types:
/// * Any comptime-only type
/// * `void`, except as a union payload
/// * `noreturn`
/// * An error set/union
/// * An opaque type, including `anyopaque`
/// * An async frame type, including `anyframe` and `anyframe->T`
/// All other types are valid.

if (status) |s| s.zoir = zoir;
if (zoir.hasCompileErrors()) return error.ParseZon;

if (status) |s| s.* = .{};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely this leaks zoir, since it won't be deinited by the defer and this line removes what would have been its new owner upon return?

EDIT: ah, I see, fromZoir is guaranteed to re-populate it. Perhaps just comment that here for clarity.

var zoir = try ZonGen.generate(gpa, ast, .{ .parse_str_lits = false });
defer if (status == null) zoir.deinit(gpa);
if (status) |s| s.zoir = zoir;
if (zoir.hasCompileErrors()) return error.ParseZon;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we call fromZoir, isn't this redundant? Just let fromZoir return the error.

}
}

fn parsePointer(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just throwing a comment in to track: remember to apply the type changes we discussed to std.zon too. (For instance, allowing initializing data through pointers.)

if (pointer.child != u8 or
pointer.size != .slice or
!pointer.is_const or
(pointer.sentinel_ptr != null and @as(*const u8, @ptrCast(pointer.sentinel_ptr)).* != 0) or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(pointer.sentinel_ptr != null and @as(*const u8, @ptrCast(pointer.sentinel_ptr)).* != 0) or
(pointer.sentinel() != null and pointer.sentinel() != 0) or

});
}
},
else => @compileError("unreachable, should not be called for type " ++ @typeName(T)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
else => @compileError("unreachable, should not be called for type " ++ @typeName(T)),
else => comptime unreachable,

},
else => {},
}
@compileError("unreachable, should not be called for type " ++ @typeName(T));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@compileError("unreachable, should not be called for type " ++ @typeName(T));
comptime unreachable;

const value_node = array_init.ast.elements[field];
break :b self.ast.firstToken(value_node);
};
return self.failTokenFmt(token, 0, "cannot store runtime value in compile time variable", .{});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's sort of a weird error here, because this is about runtime fields. How about: "cannot initialize comptime field of 'Foo'"? Or something along those lines.

Comment on lines +1023 to +1030
switch (@typeInfo(@TypeOf(value))) {
.float => {},
else => @compileError(@typeName(@TypeOf(value)) ++ " is not a runtime floating point type"),
}
switch (@typeInfo(T)) {
.int => {},
else => @compileError(@typeName(T) ++ " is not a runtime integer type"),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
switch (@typeInfo(@TypeOf(value))) {
.float => {},
else => @compileError(@typeName(@TypeOf(value)) ++ " is not a runtime floating point type"),
}
switch (@typeInfo(T)) {
.int => {},
else => @compileError(@typeName(T) ++ " is not a runtime integer type"),
}

These assertions are already done thanks to the @intFromFloat call.

return null;
}

return @as(T, @intFromFloat(value));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return @as(T, @intFromFloat(value));
return @intFromFloat(value);

RLS propagates this type information automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants