Skip to content

std.debug: fix some corner cases #23927

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

rootbeer
Copy link
Contributor

Add infinite loop detection to the std.debug backtraces. Make the backtrace and stacktrace code more robust on corner-case architectures.

Expand the "unwind.zig" test case to exercise std.debug.dumpCurrentStackTrace(). And trigger a signal handler so the test can exercise std.debug.dumpStackTraceFromBase() and std.debug.StackIterator.initWithContext() using a kernel-constructed context.

This is preparation for moving std.debug away from getContext() (#23801).

@alexrp alexrp self-assigned this May 19, 2025
// Getting the backtrace inside the signal handler (with the ucontext_t)
// gets stuck in a loop on some systems:
const expect_signal_frame_overflow =
(native_arch == .arm and link_libc); // loops above main()
Copy link
Member

@alexrp alexrp May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

glibc vs musl? armeb, thumb, thumbeb?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't investigated the failure cases too closely yet. I'm just trying to get the test to compile and not blow up on every architecture. And I'm trying to avoid watering the test down too much on the platforms where it works reliably. That said, the failure I see for this one is when statically linking musl to the test case.

I haven't been building the other ARM variants, so I'll try and mix those in too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this code was unchanged, so do the others work?

Comment on lines 39 to 44
native_arch == .mips or
native_arch == .mipsel or
native_arch == .mips64 or
native_arch == .mips64el or
native_arch == .powerpc64 or
native_arch == .powerpc64le;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's wrong with these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally they don't seem to generate traces at all (either through dumpCurrentStackTrace or with StackIterator). Zig doesn't have a ucontext_t on MIPS. I'm not sure what's up with the PowerPC ones.

On a related note, Zig CI failed on aarch64-linux (no libc) because the stack trace gets stuck in a loop above main (see https://github.com/ziglang/zig/actions/runs/15104670128/job/42451228605?pr=23927). So I've added .aarch64 to this list of ignorable failures for now. Its flaky though. The test sometimes builds traces without looping for me locally, and sometimes not. (From what I can tell a specific build of test test is deterministic, but across multiple builds its not.)

rootbeer added 2 commits May 19, 2025 14:36
This test creates three nested stack frames and then tests stack trace
creation.  Add some additional tests of stack traces by invoking
"dumpCurrentStackTrace()" and by using a signal handler's "context"
parameter to feed backtrace construction.

Make the test case at least runnable on a wide variety of systems
(including Windows, and WASI).  Because `ucontext_t` and `getcontext` are
not evenly supported everywhere, some systems are expected only get
through parts of the test.
@rootbeer rootbeer force-pushed the debug-mcontext-stage1 branch from b95af2b to 37ebc96 Compare May 19, 2025 22:41
@rootbeer rootbeer marked this pull request as ready for review May 20, 2025 15:49
@rootbeer
Copy link
Contributor Author

This is ready for a review. I think the actual fixes are all straightforward, but the test is generating a lot of stderr spew (both from the dump-stack-trace functions being tested and my verbose std.debug.print logging. This all shows up in "passing" test output (e.g., https://github.com/ziglang/zig/actions/runs/15124641832/job/42514310709?pr=23927#step:3:2243). Is there something I can do in the build.zig to hide the stderr output when the test passes? Or should I remove my print statements? But, the dump-stack-trace functions are doing most of the logging (and I suspect its generally confusing to see stack traces in passing tests). Should I disable those? (The StackIterator testing covers the meat of the code, but it does seem like the complete routines should get some testing ...)

@alexrp
Copy link
Member

alexrp commented May 20, 2025

Is there something I can do in the build.zig to hide the stderr output when the test passes?

You can capture the output from the build script, e.g. by adding an "expected output" check. See for example #23892.

@@ -13,6 +13,9 @@ const native_arch = builtin.cpu.arch;
const native_os = builtin.os.tag;
const native_endian = native_arch.endian();

/// Maximum distance to walk when iterating through a stack trace.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Maximum distance to walk when iterating through a stack trace.
/// Maximum number of frames to walk when iterating through a stack trace.

Comment on lines +32 to +33
(native_arch != .wasm32) and
(native_arch != .wasm64); // wasm has no introspection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
(native_arch != .wasm32) and
(native_arch != .wasm64); // wasm has no introspection
!native_arch.isWasm(); // wasm has no introspection

// gets stuck in a loop on some systems:
const expect_signal_frame_overflow =
(native_arch == .arm and link_libc) or // loops above main()
(native_arch == .aarch64); // non-deterministic, sometimes overflows, sometimes not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aarch64_be?

(native_arch == .x86_64 and link_libc and builtin.abi.isGnu()) or // stuck on pthread_kill?
(native_arch == .x86_64 and link_libc and builtin.abi.isMusl() and builtin.omit_frame_pointer) or // immediately confused backtrace
(native_arch == .x86_64 and builtin.os.tag.isDarwin()) or // immediately confused backtrace
(native_arch == .aarch64 or native_arch == .aarch64_be) or // non-deterministic, sometimes overflows, sometimes confused
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
(native_arch == .aarch64 or native_arch == .aarch64_be) or // non-deterministic, sometimes overflows, sometimes confused
native_arch.isAARCH64() or // non-deterministic, sometimes overflows, sometimes confused

(native_arch == .x86_64 and link_libc and builtin.abi.isMusl() and builtin.omit_frame_pointer) or // immediately confused backtrace
(native_arch == .x86_64 and builtin.os.tag.isDarwin()) or // immediately confused backtrace
(native_arch == .aarch64 or native_arch == .aarch64_be) or // non-deterministic, sometimes overflows, sometimes confused
(native_arch == .riscv64 and link_libc) or // `ucontext_t` not defined yet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

riscv32? Also has no ucontext_t IIRC.

Comment on lines +49 to +52
native_arch == .mips or // Missing ucontext_t. Most stack traces are empty ... (with or without libc)
native_arch == .mipsel or // same as .mips
native_arch == .mips64 or // same as .mips
native_arch == .mips64el or // same as .mips
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
native_arch == .mips or // Missing ucontext_t. Most stack traces are empty ... (with or without libc)
native_arch == .mipsel or // same as .mips
native_arch == .mips64 or // same as .mips
native_arch == .mips64el or // same as .mips
native_arch.isMIPS() or // Missing ucontext_t. Most stack traces are empty ... (with or without libc)

Comment on lines +53 to +54
native_arch == .powerpc64 or // dumpCurrent* useless, StackIterator empty, ctx-based trace empty (with or without libc)
native_arch == .powerpc64le; // same as .powerpc64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
native_arch == .powerpc64 or // dumpCurrent* useless, StackIterator empty, ctx-based trace empty (with or without libc)
native_arch == .powerpc64le; // same as .powerpc64
native_arch.isPowerPC64() or // dumpCurrent* useless, StackIterator empty, ctx-based trace empty (with or without libc)

Also what about powerpc?

@rootbeer
Copy link
Contributor Author

@alexrp Thanks again for the reviews! One more question before I push a new version up: Should I make changes anywhere to get this test to compile/run against targets other than the default on CI?

@alexrp
Copy link
Member

alexrp commented May 20, 2025

I guess you could just change the test's build.zig to ignore the standard target option and instead build & run for a predefined set of targets? test/llvm_targets.zig is a good resource for a list of targets that are actually relevant/real (although some of them we obviously don't fully support yet).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants