Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid assumptions in type resolution logic cause incorrect resolution success, crashing compiler #23344

Open
DonIsaac opened this issue Mar 24, 2025 · 1 comment
Labels
bug Observed behavior contradicts documented or intended behavior
Milestone

Comments

@DonIsaac
Copy link

Zig Version

0.14.0

Steps to Reproduce and Observed Behavior

Found this crash while setting up Zig tests in Bun. You can reproduce it by checking out this PR.

const std = @import("std");
const bun = @import("root").bun;
const t = std.testing;

// un-commenting this line causes the crash
// test {
//     std.testing.refAllDecls(bun);
// }

test "idk strings or something" {
    var s = bun.String.createUTF8("hi"); // it can create `WTF::String`s too
    defer s.deref();
    try t.expectEqual(s.length(), 2);
    try t.expectEqualStrings(s.asUTF8().?, "hi");
}

When building with a ReleaseSafe build of Zig, I get this stack trace:

[1/4] Building src/*.zig for aarch64-macos-none
info: zig compiler v0.14.0
test
└─ install generated to bun-test.o
   └─ zig test bun-test Debug aarch64-macos.13.0-none failure
error: thread 55734837 panic: reached unreachable code
???:?:?: 0x10988a57b in _InternPool.LoadedStructType.RuntimeOrderIterator.next (???)
???:?:?: 0x1097f9bef in _codegen.llvm.Object.lowerDebugType (???)
???:?:?: 0x1097fbeb7 in _codegen.llvm.Object.lowerDebugType (???)
???:?:?: 0x1097f9e03 in _codegen.llvm.Object.lowerDebugType (???)
???:?:?: 0x1097fbeb7 in _codegen.llvm.Object.lowerDebugType (???)
???:?:?: 0x1097f92ff in _codegen.llvm.Object.lowerDebugType (???)
???:?:?: 0x1097f9e03 in _codegen.llvm.Object.lowerDebugType (???)
???:?:?: 0x1097fbeb7 in _codegen.llvm.Object.lowerDebugType (???)
???:?:?: 0x1097f9e03 in _codegen.llvm.Object.lowerDebugType (???)
???:?:?: 0x1097fbeb7 in _codegen.llvm.Object.lowerDebugType (???)
???:?:?: 0x1097f8717 in _codegen.llvm.Object.lowerDebugType (???)
???:?:?: 0x1097fb10b in _codegen.llvm.Object.lowerDebugType (???)
???:?:?: 0x109d10fb3 in _codegen.llvm.Object.updateFunc (???)
???:?:?: 0x109a8dfa3 in _link.doTask (???)
???:?:?: 0x1097e7593 in _Compilation.performAllTheWorkInner (???)
???:?:?: 0x10972bbd3 in _Compilation.update (???)
???:?:?: 0x10976f94b in _main.serve (???)
???:?:?: 0x109789dcb in _main.buildOutputType (???)
???:?:?: 0x1096dd01f in _main (???)
???:?:?: 0x19e124273 in ??? (???)
???:?:?: 0x1806ffffffffffff in ??? (???)

error: the following command terminated unexpectedly:
/Users/donisaac/Documents/bun/bun3/vendor/zig/zig test -freference-trace=24 -fllvm -fno-lld -fno-strip -fno-omit-frame-pointer -ODebug -target aarch64-macos.13.0-none -mcpu apple_m1 --dep zlib-internal --dep async --dep ZigGeneratedClasses --dep ResolvedSourceTag --dep ErrorCode --dep completions-bash --dep completions-zsh --dep completions-fish --dep build_options --dep translated-c-headers -Mroot=/Users/donisaac/Documents/bun/bun3/src/unit_test.zig -Mzlib-internal=/Users/donisaac/Documents/bun/bun3/src/deps/zlib.posix.zig -Masync=/Users/donisaac/Documents/bun/bun3/src/async/posix_event_loop.zig -MZigGeneratedClasses=/Users/donisaac/Documents/bun/bun3/build/debug/codegen/ZigGeneratedClasses.zig -MResolvedSourceTag=/Users/donisaac/Documents/bun/bun3/build/debug/codegen/ResolvedSourceTag.zig -MErrorCode=/Users/donisaac/Documents/bun/bun3/build/debug/codegen/ErrorCode.zig -Mcompletions-bash=/Users/donisaac/Documents/bun/bun3/completions/bun.bash -Mcompletions-zsh=/Users/donisaac/Documents/bun/bun3/completions/bun.zsh -Mcompletions-fish=/Users/donisaac/Documents/bun/bun3/completions/bun.fish -Mbuild_options=/Users/donisaac/Documents/bun/bun3/build/debug/cache/zig/local/c/da395d47caf5898cd97464e862dd658f/options.zig -ODebug -target aarch64-macos.13.0-none -mcpu apple_m1 -Mtranslated-c-headers=/Users/donisaac/Documents/bun/bun3/build/debug/cache/zig/local/o/cb2c87e16a2efe8826ab5fa894a3fcf0/c-headers-for-zig.zig -lc++ -lc --test-runner /Users/donisaac/Documents/bun/bun3/src/main_test.zig -ffunction-sections -fdata-sections -fallow-shlib-undefined --cache-dir /Users/donisaac/Documents/bun/bun3/build/debug/cache/zig/local --global-cache-dir /Users/donisaac/Documents/bun/bun3/build/debug/cache/zig/global --name bun-test -fno-compiler-rt -fno-ubsan-rt --zig-lib-dir /Users/donisaac/Documents/bun/bun3/vendor/zig/lib/ --listen=-
Build Summary: 2/5 steps succeeded; 1 failed
test transitive failure
└─ install generated to bun-test.o transitive failure
   └─ zig test bun-test Debug aarch64-macos.13.0-none failure
      ├─ options cached
      └─ translate-c cached 25ms MaxRSS:35M
error: the following build command failed with exit code 1:
/Users/donisaac/Documents/bun/bun3/build/debug/cache/zig/local/o/56600897baa72f85f47b0344254a5fd8/build /Users/donisaac/Documents/bun/bun3/vendor/zig/zig /Users/donisaac/Documents/bun/bun3/vendor/zig/lib /Users/donisaac/Documents/bun/bun3 /Users/donisaac/Documents/bun/bun3/build/debug/cache/zig/local /Users/donisaac/Documents/bun/bun3/build/debug/cache/zig/global --seed 0xd21820a4 -Zedd4454974fed375 test --prefix /Users/donisaac/Documents/bun/bun3/build/debug -Dobj_format=obj -Dtarget=aarch64-macos-none -Doptimize=Debug -Dcpu=apple_m1 -Denable_logs=true -Dversion=1.2.6 -Dreported_nodejs_version=22.6.0 -Dcanary=1 -Dcodegen_path=/Users/donisaac/Documents/bun/bun3/build/debug/codegen -Dcodegen_embed=false --prominent-compile-errors --summary all -Dsha=9c1d6fdd2b6f6849fe39cbaf2d0db5af8f9aaef0
FAILED: bun-test.o /Users/donisaac/Documents/bun/bun3/build/debug/bun-test.o
cd /Users/donisaac/Documents/bun/bun3 && /Users/donisaac/Documents/bun/bun3/vendor/zig/zig build test --cache-dir /Users/donisaac/Documents/bun/bun3/build/debug/cache/zig/local --global-cache-dir /Users/donisaac/Documents/bun/bun3/build/debug/cache/zig/global --zig-lib-dir /Users/donisaac/Documents/bun/bun3/vendor/zig/lib --prefix /Users/donisaac/Documents/bun/bun3/build/debug -Dobj_format=obj -Dtarget=aarch64-macos-none -Doptimize=Debug -Dcpu=apple_m1 -Denable_logs=true -Dversion=1.2.6 -Dreported_nodejs_version=22.6.0 -Dcanary=1 -Dcodegen_path=/Users/donisaac/Documents/bun/bun3/build/debug/codegen -Dcodegen_embed=false --prominent-compile-errors --summary all -Dsha=9c1d6fdd2b6f6849fe39cbaf2d0db5af8f9aaef0

Expected Behavior

Build succeeds and tests run.

@DonIsaac DonIsaac added the bug Observed behavior contradicts documented or intended behavior label Mar 24, 2025
@mlugg
Copy link
Member

mlugg commented Apr 1, 2025

I've managed to reduce this. Unfortunately, it's a very deep bug which demonstrates fundamental flaws with our type resolution strategy.

The idea here is to abuse the fact that while resolving recursive types, we treat a type as successfully resolved if its resolution is WIP (below us on the call stack). That assumption isn't valid, because a later field of that WIP type might cause resolution to fail! This means we can end up in a situation where a type A is fully resolved, but it contains a pointer to a type B which is not fully resolved due to compile errors. That's precisely what full type resolution is meant to avoid, so clearly this system isn't working as intended.

Here's the smallest I managed to get the repro, heavily commented:

/// `Invalid` is a type which fails layout/full resolution.
/// The `x` field is there to make sure the type has runtime bits to avoid triggering #14903 instead.
const Invalid = struct { x: u8, y: Invalid };

/// This type, `A`, is what we're going to want to fully resolve.
/// The type resolution process looks like this:
/// * We mark `A` as WIP
/// * We recurse into `B`
/// * `B` sees that `A` is WIP so does not recurse; it marks itself as fully resolved since all fields have been checked
/// * Back in `A`, we recurse into `Invalid`
/// * We see an error, so `A` is *not* marked as fully resolved
/// * Ultimately, `A` is correctly marked as failed, but `B` is marked as fully resolved
const A = struct {
    b: *B,
    x: *Invalid,
};
/// This is the type whose resolution wrongly succeeds.
const B = struct {
    a: A,
};

comptime {
    // First, we queue full resolution of `A`.
    // Per the steps above, when this happens, `B` will incorrectly succeed full resolution.
    _ = A;
    // We want to analyze `bar` *after* fully resolving `A`, but usually, the compiler will
    // queue function analysis before type resolution. To get the ordering we want, we use
    // a wrapper type, whose nested `comptime` decl will be analyzed after the resolution we
    // queued above.
    _ = struct {
        comptime {
            _ = &bar;
        }
    };
}

/// This is the function whose analysis triggers the compiler crash.
/// It fails because `Air.types_resolved` determines that every type here is fully resolved,
/// but `codegen.llvm` later hits some unresolved state (the runtime field order of `Invalid`).
fn bar(x: *B) void {
    _ = x;
}
Stack Trace
thread 22068 panic: reached unreachable code
Unwind error at address `:0x5a0a604` (error.AddressOutOfRange), trace may be incomplete

/home/mlugg/zig/master/src/InternPool.zig:3644:32: 0x68952c3 in toInt (main.zig)
                .unresolved => unreachable,
                               ^
/home/mlugg/zig/master/src/InternPool.zig:4082:72: 0x62e24e8 in next (main.zig)
                return it.struct_type.runtime_order.get(it.ip)[i].toInt();
                                                                       ^
/home/mlugg/zig/master/src/codegen/llvm.zig:2318:31: 0x6057381 in lowerDebugType (main.zig)
                while (it.next()) |field_index| {
                              ^
/home/mlugg/zig/master/src/codegen/llvm.zig:1931:59: 0x604e4a3 in lowerDebugType (main.zig)
                const debug_elem_ty = try o.lowerDebugType(Type.fromInterned(ptr_info.child));
                                                          ^
/home/mlugg/zig/master/src/codegen/llvm.zig:2331:45: 0x6057a1e in lowerDebugType (main.zig)
                        try o.lowerDebugType(field_ty),
                                            ^
/home/mlugg/zig/master/src/codegen/llvm.zig:2331:45: 0x6057a1e in lowerDebugType (main.zig)
                        try o.lowerDebugType(field_ty),
                                            ^
/home/mlugg/zig/master/src/codegen/llvm.zig:1931:59: 0x604e4a3 in lowerDebugType (main.zig)
                const debug_elem_ty = try o.lowerDebugType(Type.fromInterned(ptr_info.child));
                                                          ^
/home/mlugg/zig/master/src/codegen/llvm.zig:2554:84: 0x605e7a9 in lowerDebugType (main.zig)
                        debug_param_types.appendAssumeCapacity(try o.lowerDebugType(param_ty));
                                                                                   ^
/home/mlugg/zig/master/src/codegen/llvm.zig:1415:57: 0x75207ec in updateFunc (main.zig)
            const debug_decl_type = try o.lowerDebugType(fn_ty);
                                                        ^
/home/mlugg/zig/master/src/link/Elf.zig:2379:70: 0x88b1258 in updateFunc (main.zig)
    if (self.llvm_object) |llvm_object| return llvm_object.updateFunc(pt, func_index, air, liveness);
                                                                     ^
/home/mlugg/zig/master/src/link.zig:747:82: 0x7f87566 in updateFunc (main.zig)
                return @as(*tag.Type(), @fieldParentPtr("base", base)).updateFunc(pt, func_index, air, liveness);
                                                                                 ^
/home/mlugg/zig/master/src/Zcu/PerThread.zig:1711:22: 0x7527bf6 in linkerUpdateFunc (main.zig)
        lf.updateFunc(pt, func_index, air, liveness) catch |err| switch (err) {
                     ^
/home/mlugg/zig/master/src/link.zig:1602:36: 0x6e67407 in doTask (main.zig)
                pt.linkerUpdateFunc(func.func, func.air) catch |err| switch (err) {
                                   ^
/home/mlugg/zig/master/src/Compilation.zig:4145:20: 0x67bbede in dispatchCodegenTask (main.zig)
        link.doTask(comp, tid, link_task);
                   ^
/home/mlugg/zig/master/src/Compilation.zig:4040:37: 0x629075e in processOneJob (main.zig)
            comp.dispatchCodegenTask(tid, .{ .codegen_func = func });
                                    ^
/home/mlugg/zig/master/src/Compilation.zig:3990:30: 0x5f95f56 in performAllTheWorkInner (main.zig)
            try processOneJob(@intFromEnum(Zcu.PerThread.Id.main), comp, job);
                             ^
/home/mlugg/zig/master/src/Compilation.zig:3730:36: 0x5d6b4dd in performAllTheWork (main.zig)
    try comp.performAllTheWorkInner(main_progress_node);
                                   ^
/home/mlugg/zig/master/src/Compilation.zig:2333:31: 0x5b8af97 in update (main.zig)
    try comp.performAllTheWork(main_progress_node);
                              ^
/home/mlugg/zig/master/src/main.zig:4530:20: 0x5bd1acd in updateModule (main.zig)
    try comp.update(prog_node);
                   ^
/home/mlugg/zig/master/src/main.zig:3720:21: 0x5c4ef11 in buildOutputType (main.zig)
        updateModule(comp, color, root_prog_node) catch |err| switch (err) {
                    ^
/home/mlugg/zig/master/src/main.zig:277:31: 0x5cb8963 in mainArgs (main.zig)
        return buildOutputType(gpa, arena, args, .{ .build = .Obj });
                              ^
/home/mlugg/zig/master/src/main.zig:212:20: 0x5afd22b in main (main.zig)
    return mainArgs(gpa, arena, args);
                   ^
/home/mlugg/zig/master/lib/std/start.zig:656:37: 0x5afac89 in main (std.zig)
            const result = root.main() catch |err| {
                                    ^
Aborted

Between this issue, #23400, #23362, #20134, #19920, #14903, and other related issues, it seems clear to me that there are fundamental flaws in how the Zig compiler handles type resolution. We need to figure out a new approach.

@mlugg mlugg added this to the 0.15.0 milestone Apr 1, 2025
@mlugg mlugg changed the title Crash in InternPool.LoadedStructType.RuntimeOrderIterator Invalid assumptions in type resolution logic cause incorrect resolution success, crashing compiler Apr 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior
Projects
None yet
Development

No branches or pull requests

2 participants