-
Notifications
You must be signed in to change notification settings - Fork 397
[PROF-12743] Runtime stack collection callback registration #4984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
gyuheon0h
merged 37 commits into
master
from
gyuheon0h/prof-12743-runtime-stack-callback
Dec 15, 2025
+335
−7
Merged
Changes from 29 commits
Commits
Show all changes
37 commits
Select commit
Hold shift + click to select a range
07dca23
First commit; declare libdatadog ext api
gyuheon0h 441b302
Dummy frame emitted
gyuheon0h a2d4dbe
Include internal structures (at least try to))
gyuheon0h ab81e8f
Can walk the stack, but no func names yet
gyuheon0h 434f51f
Add safety checks when walking stack
gyuheon0h 3bc90d8
Get C frames, and also add some AI generated sanity check tests
gyuheon0h c996800
Clean up tests
gyuheon0h e008db7
Use all safety checks
gyuheon0h f4c2e7b
Category specific runtime callback
gyuheon0h c1af1a3
Move runtime stuff to own file
gyuheon0h 2f39282
Try to fix issues across diff ruby versions
gyuheon0h e8c14ee
remove early continue
gyuheon0h 1daed6f
3.3 issues
gyuheon0h 0b8401e
Fallback to common headers is MJIT headers fail?
gyuheon0h 86d25f9
Move into own extension
gyuheon0h 7e6dd2a
Move into prof extension
gyuheon0h 060f872
Clean up
gyuheon0h ce79b31
Gate behind flag
gyuheon0h db8cb2b
Respond to code cleanliness comments; still need to fix logic/impl
gyuheon0h 348db9e
No need to do minimal utf8 checking
gyuheon0h 17609db
Register callback before starting CT
gyuheon0h aa014d0
Don't run separate script for runtime stacks spec
gyuheon0h 7ce96e1
Remove env flag for rt stacks
gyuheon0h 3be8580
Clean up iseq body validation logic
gyuheon0h 6c8e8f1
Remove double frame skipping bug and unsafe dereference
gyuheon0h 103dd46
Further clean up and small logic fixes
gyuheon0h 01ff759
Respond to review (spec clean up, redundant steps removed, etc)
gyuheon0h acc5cf1
Just register on native side always, add comment on future work to mo…
gyuheon0h 4dfcbe2
Lets reuse profiling api to traverse frames
gyuheon0h 2e677b9
Gate for linux only, combine string validation utils, define max stac…
gyuheon0h fbd6ebd
Emit placeholder frames for no runtime stacks and truncation
gyuheon0h 998cca7
Use rb_typeddata_is_kind_of and dont touch rb_thread_t
gyuheon0h 5ab8327
Dont access directly and private ruby stuff
gyuheon0h c79e6fa
Reduce mincore call
gyuheon0h 1f6b623
improve placeholder clarity
gyuheon0h 4feafcc
Raise exception on registration failure
gyuheon0h 27e0681
Fail w error message if profiling not supported for spec, fix truncat…
gyuheon0h File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
261 changes: 261 additions & 0 deletions
261
ext/datadog_profiling_native_extension/crashtracking_runtime_stacks.c
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,261 @@ | ||
| // NOTE: This file is a part of the profiling native extension even though the | ||
| // runtime stacks feature is consumed by the crashtracker. The profiling | ||
| // extension already carries all the Ruby VM private header access and build | ||
| // plumbing required to safely poke at internal structures. Sharing that setup | ||
| // avoids duplicating another native extension with the same (fragile) access | ||
| // patterns, and keeps the overall install/build surface area smaller. | ||
| #include "extconf.h" | ||
gyuheon0h marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| #ifdef RUBY_MJIT_HEADER | ||
| // Pick up internal structures from the private Ruby MJIT header file | ||
| #include RUBY_MJIT_HEADER | ||
| #else | ||
| // The MJIT header was introduced on 2.6 and removed on 3.3; for other Rubies we rely on | ||
| // the datadog-ruby_core_source gem to get access to private VM headers. | ||
|
|
||
| // We can't do anything about warnings in VM headers, so we just use this technique to suppress them. | ||
| // See https://nelkinda.com/blog/suppress-warnings-in-gcc-and-clang/#d11e364 for details. | ||
| #pragma GCC diagnostic push | ||
| #pragma GCC diagnostic ignored "-Wunused-parameter" | ||
| #pragma GCC diagnostic ignored "-Wattributes" | ||
| #pragma GCC diagnostic ignored "-Wpragmas" | ||
| #pragma GCC diagnostic ignored "-Wexpansion-to-defined" | ||
| #include <vm_core.h> | ||
| #pragma GCC diagnostic pop | ||
|
|
||
| #pragma GCC diagnostic push | ||
| #pragma GCC diagnostic ignored "-Wunused-parameter" | ||
| #include <iseq.h> | ||
| #pragma GCC diagnostic pop | ||
|
|
||
| #include <ruby.h> | ||
|
|
||
| #ifndef NO_RACTOR_HEADER_INCLUDE | ||
| #pragma GCC diagnostic push | ||
| #pragma GCC diagnostic ignored "-Wunused-parameter" | ||
| #include <ractor_core.h> | ||
| #pragma GCC diagnostic pop | ||
| #endif | ||
| #endif | ||
|
|
||
| #include <datadog/crashtracker.h> | ||
| #include "datadog_ruby_common.h" | ||
| #include "private_vm_api_access.h" | ||
| #include <sys/mman.h> | ||
| #include <unistd.h> | ||
| #include <errno.h> | ||
| #include <string.h> | ||
|
|
||
| static const rb_data_type_t *crashtracker_thread_data_type = NULL; | ||
|
|
||
| static void ruby_runtime_stack_callback( | ||
| void (*emit_frame)(const ddog_crasht_RuntimeStackFrame*) | ||
| ); | ||
|
|
||
| // Use a fixed, preallocated buffer for crash-time runtime stacks to avoid | ||
| // heap allocation in the signal/crash path. | ||
| static const int RUNTIME_STACK_MAX_FRAMES = 512; | ||
| static frame_info runtime_stack_buffer[512]; | ||
gyuheon0h marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| #if defined(__x86_64__) | ||
| # define SYS_MINCORE 0x1B | ||
| #elif defined(__aarch64__) | ||
| # define SYS_MINCORE 0xE8 | ||
| #endif | ||
|
|
||
| long syscall(long number, ...); | ||
|
|
||
| // align down to power of two | ||
| static inline uintptr_t align_down(uintptr_t x, uintptr_t align) { | ||
| return x & ~(align - 1u); | ||
| } | ||
|
|
||
| // This function is not necessarily Ruby specific. This will be moved to | ||
| // `libdatadog` in the future as a shared utility function. | ||
| static inline bool is_pointer_readable(const void *ptr, size_t size) { | ||
| if (!ptr || size == 0) return false; | ||
|
|
||
| uintptr_t page_size = (uintptr_t)sysconf(_SC_PAGESIZE); | ||
| // fallback for weird value; 0 or not a power of two | ||
| if (page_size == 0 || (page_size & (page_size - 1u))) { | ||
| page_size = 4096; | ||
| } | ||
gyuheon0h marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| const uintptr_t start = align_down((uintptr_t)ptr, page_size); | ||
| const uintptr_t end = ((uintptr_t)ptr + size - 1u); | ||
| const uintptr_t last = align_down(end, page_size); | ||
|
|
||
| // Number of pages spanned | ||
| size_t pages = 1u + (last != start); | ||
| if (pages > 2u) pages = 2u; | ||
|
|
||
| unsigned char vec[2]; | ||
|
|
||
| int retries = 5; | ||
| for (;;) { | ||
| size_t len = pages * (size_t)page_size; | ||
| long rc = syscall(SYS_MINCORE, (void*)start, len, vec); | ||
gyuheon0h marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| if (rc == 0) { | ||
| return true; | ||
| } | ||
|
|
||
| int e = errno; | ||
| if (e == ENOMEM || e == EFAULT) { | ||
| return false; | ||
| } | ||
|
|
||
| if (e == EAGAIN && retries-- > 0) { | ||
| continue; | ||
| } | ||
|
|
||
| // Unknown errno, we assume mapped to avoid cascading faults in crash path | ||
| return true; | ||
| } | ||
| } | ||
gyuheon0h marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| static inline ddog_CharSlice char_slice_from_cstr(const char *cstr) { | ||
| if (cstr == NULL) { | ||
| return (ddog_CharSlice){.ptr = NULL, .len = 0}; | ||
| } | ||
| return (ddog_CharSlice){.ptr = cstr, .len = strlen(cstr)}; | ||
| } | ||
|
|
||
| // Heuristically validate a Ruby string VALUE before dereferencing it from crash context: | ||
| // 1) ensure it is actually a String heap object (no immediates/symbols) | ||
| // 2) confirm both the common header (RBasic) and the string payload (RString) live in | ||
| // readable pages to avoid faulting while inspecting potentially corrupted memory | ||
| // 3) enforce upper bound for lengths we expect to emit (file names, function names) | ||
| static bool is_reasonable_string_size(VALUE str) { | ||
| if (str == Qnil) return false; | ||
|
|
||
| // After RB_TYPE_P confirms this VALUE is a heap string, the tagged VALUE is | ||
| // guaranteed to be an aligned pointer, so casting to void* is equivalent to | ||
| // RBASIC(str); we verify the object header is readable before touching it. | ||
| if (!is_pointer_readable((const void *)str, sizeof(struct RBasic))) return false; | ||
|
|
||
| // For strings, we need to check the full RString structure | ||
| if (!is_pointer_readable((const void *)str, sizeof(struct RString))) return false; | ||
|
|
||
| long len = RSTRING_LEN(str); | ||
|
|
||
| if (len < 0) return false; // Negative length, probably corrupted | ||
| if (len > 1024) return false; // > 1KB path/function name, sus | ||
|
|
||
| return true; | ||
| } | ||
|
|
||
| static const char* safe_string_ptr(VALUE str) { | ||
| if (str == Qnil) return "<nil>"; | ||
| if (!RB_TYPE_P(str, T_STRING)) return "<not_string>"; | ||
|
|
||
| // Validate the VALUE first before touching any of its internals | ||
| if (!is_reasonable_string_size(str)) return "<corrupted>"; | ||
|
|
||
| long len = RSTRING_LEN(str); | ||
| const char *ptr = RSTRING_PTR(str); | ||
|
|
||
| if (!ptr) return "<null>"; | ||
|
|
||
| if (!is_pointer_readable(ptr, len > 0 ? len : 1)) return "<unreadable>"; | ||
|
|
||
| return ptr; | ||
| } | ||
gyuheon0h marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| // Collect the crashing thread's frames via ddtrace_rb_profile_frames into a static buffer, then emit | ||
| // them newest-first. If corruption is detected, emit placeholder frames so the crash report still | ||
| // completes. We lean on the Ruby VM helpers we already use for profiling and rely on crashtracker's | ||
| // safety nets so a failure here should not impact customers. | ||
| static void ruby_runtime_stack_callback( | ||
ivoanjo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| void (*emit_frame)(const ddog_crasht_RuntimeStackFrame*) | ||
| ) { | ||
gyuheon0h marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| // Grab the Ruby thread we crashed on; crashtracker only runs once. | ||
| VALUE current_thread = rb_thread_current(); | ||
| if (current_thread == Qnil) return; | ||
|
|
||
| if (crashtracker_thread_data_type == NULL) return; | ||
|
|
||
| rb_thread_t *th = (rb_thread_t *) rb_check_typeddata(current_thread, crashtracker_thread_data_type); | ||
ivoanjo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| if (!th || !is_pointer_readable(th, sizeof(*th))) return; | ||
|
|
||
| // Use the profiling helper to gather frames into our static buffer. | ||
| int frame_count = ddtrace_rb_profile_frames( | ||
| current_thread, | ||
| 0, | ||
| RUNTIME_STACK_MAX_FRAMES, | ||
| runtime_stack_buffer | ||
| ); | ||
|
|
||
| if (frame_count <= 0) return; | ||
gyuheon0h marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| for (int i = frame_count - 1; i >= 0; i--) { | ||
| frame_info *info = &runtime_stack_buffer[i]; | ||
|
|
||
| if (info->is_ruby_frame) { | ||
| const rb_iseq_t *iseq = (const rb_iseq_t *)info->as.ruby_frame.iseq; | ||
| const char *function_name = "<unknown>"; | ||
| const char *file_name = "<unknown>"; | ||
|
|
||
| if (iseq && is_pointer_readable(iseq, sizeof(rb_iseq_t))) { | ||
gyuheon0h marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| VALUE name = rb_iseq_base_label(iseq); | ||
| if (name != Qnil) { | ||
| function_name = safe_string_ptr(name); | ||
| } | ||
|
|
||
| VALUE filename = rb_iseq_path(iseq); | ||
| if (filename != Qnil) { | ||
| file_name = safe_string_ptr(filename); | ||
| } | ||
| } | ||
|
|
||
| ddog_crasht_RuntimeStackFrame frame = { | ||
| .type_name = char_slice_from_cstr(NULL), | ||
| .function = char_slice_from_cstr(function_name), | ||
| .file = char_slice_from_cstr(file_name), | ||
| .line = info->as.ruby_frame.line, | ||
| .column = 0 | ||
| }; | ||
|
|
||
| emit_frame(&frame); | ||
| } else { | ||
| const char *function_name = "<C method>"; | ||
| const char *file_name = "<C extension>"; | ||
|
|
||
| if (info->as.native_frame.method_id) { | ||
| const char *method_name = rb_id2name(info->as.native_frame.method_id); | ||
| if (is_pointer_readable(method_name, 256)) { | ||
| size_t method_name_len = strnlen(method_name, 256); | ||
| if (method_name_len > 0 && method_name_len < 256) { | ||
| function_name = method_name; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| ddog_crasht_RuntimeStackFrame frame = { | ||
| .type_name = char_slice_from_cstr(NULL), | ||
| .function = char_slice_from_cstr(function_name), | ||
| .file = char_slice_from_cstr(file_name), | ||
| .line = 0, | ||
| .column = 0 | ||
| }; | ||
|
|
||
| emit_frame(&frame); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| void crashtracking_runtime_stacks_init(void) { | ||
| if (crashtracker_thread_data_type == NULL) { | ||
| VALUE current_thread = rb_thread_current(); | ||
| if (current_thread == Qnil) return; | ||
gyuheon0h marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| const rb_data_type_t *thread_data_type = RTYPEDDATA_TYPE(current_thread); | ||
| if (!thread_data_type) return; | ||
|
|
||
| crashtracker_thread_data_type = thread_data_type; | ||
| } | ||
|
|
||
| // Register immediately so Ruby doesn't need to manage this explicitly. | ||
| ddog_crasht_register_runtime_frame_callback(ruby_runtime_stack_callback); | ||
gyuheon0h marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| } | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.