Skip to content

Conversation

@gyuheon0h
Copy link
Contributor

@gyuheon0h gyuheon0h commented Oct 16, 2025

What does this PR do?
We want to collect runtime frames for the crashtracker. This approach follows how we access the call stack for profiling.

Motivation:

Change log entry

Additional Notes:

How to test the change?

Run a ruby program with the crashtracker initialized. Runtime stacks should be visible in the experimental section of the crash report.

You can also run this test:
"Ruby and C method runtime stack capture"

and see the outputted runtime stacktrace

{
  "format": "Datadog Runtime Callback 1.0",
  "frames": [
    {
      "file": "<C extension>",
      "function": "free"
    },
    {
      "file": "test.rb",
      "function": "final_crash_point",
      "line": 248
    },
    {
      "file": "<C extension>",
      "function": "times"
    },
    {
      "file": "test.rb",
      "function": "final_crash_point",
      "line": 247
    },
.....
    {
      "file": "<C extension>",
      "function": "fork"
    },
    {
      "file": "test.rb",
      "function": "<main>",
      "line": 267
    }
  ]
}

@gyuheon0h gyuheon0h requested review from a team as code owners October 16, 2025 09:34
@gyuheon0h gyuheon0h marked this pull request as draft October 16, 2025 09:34
@github-actions
Copy link

github-actions bot commented Oct 16, 2025

👋 Hey @DataDog/ruby-guild, please fill "Change log entry" section in the pull request description.

If changes need to be present in CHANGELOG.md you can state it this way

**Change log entry**

Yes. A brief summary to be placed into the CHANGELOG.md

(possible answers Yes/Yep/Yeah)

Or you can opt out like that

**Change log entry**

None.

(possible answers No/Nope/None)

Visited at: 2025-12-15 05:33:52 UTC

@github-actions github-actions bot added the core Involves Datadog core libraries label Oct 16, 2025
@github-actions
Copy link

github-actions bot commented Oct 16, 2025

Typing analysis

This PR does not change typing compared to the base branch.

@datadog-datadog-prod-us1
Copy link
Contributor

datadog-datadog-prod-us1 bot commented Oct 20, 2025

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage
Patch Coverage: 54.17%
Overall Coverage: 95.24% (-0.01%)

View detailed report

This comment will be updated automatically if new data arrives.
* Fix with Cursor requires Datadog plugin ≥v2.17.0
🔗 Commit SHA: 0b05d48 | Docs | Datadog PR Page | Was this helpful? Give us feedback!

@gyuheon0h gyuheon0h force-pushed the gyuheon0h/prof-12743-runtime-stack-callback branch from e628919 to 564c828 Compare October 24, 2025 15:37
Copy link
Contributor Author

gyuheon0h commented Oct 24, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@gyuheon0h gyuheon0h force-pushed the gyuheon0h/prof-12743-runtime-stack-callback branch 3 times, most recently from f102a46 to 53afc20 Compare October 31, 2025 15:19
@gyuheon0h
Copy link
Contributor Author

gyuheon0h commented Nov 8, 2025

Paper trail

[ ] Checking unreasonable string sizes
[ ] Checking that string pointers point to valid strings
[ ] Checking that control frames are valid
[ ] Checking that iseq is valid, and the instruction size is not unreasonable
[ ] Validating that pointers are readable using mincore
[ ] Checking for recursive frames
[ ] Strings can have different representations, take that into account
[ ] Ruby apps commonly have very deep stacks. We default in the profiler to collecting 400 and we've seen GitHub go close to 600, handle this
[ ] Pay attention to structure keeping the bytecode-to-line mapping

@gyuheon0h gyuheon0h force-pushed the gyuheon0h/prof-12743-runtime-stack-callback branch 6 times, most recently from 77d1391 to 1ebb325 Compare November 17, 2025 02:56
@gyuheon0h gyuheon0h force-pushed the gyuheon0h/prof-12743-runtime-stack-callback branch from 1ebb325 to 4e51bf3 Compare November 17, 2025 03:40
@gyuheon0h gyuheon0h force-pushed the gyuheon0h/prof-12743-runtime-stack-callback branch 2 times, most recently from 3c782d0 to 8d957bf Compare November 17, 2025 03:48
@github-actions github-actions bot added the profiling Involves Datadog profiling label Nov 17, 2025
@gyuheon0h gyuheon0h marked this pull request as ready for review November 17, 2025 03:52
@gyuheon0h gyuheon0h changed the title [WIP][crashtracking] Runtime stack collection callback registration [crashtracking] Runtime stack collection callback registration Nov 17, 2025
@gyuheon0h gyuheon0h changed the title [crashtracking] Runtime stack collection callback registration [PROF-12743] Runtime stack collection callback registration Nov 17, 2025
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@gyuheon0h gyuheon0h force-pushed the gyuheon0h/prof-12743-runtime-stack-callback branch from 3180502 to 4dfcbe2 Compare December 11, 2025 19:38
@gyuheon0h gyuheon0h requested a review from ivoanjo December 11, 2025 20:30
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes! I think this is almost ready -- I've left one final round of comments and I think we'll be able to merge this next week and ship it before x-mas! ;)


if (crashtracker_thread_data_type == NULL) return;

rb_thread_t *th = (rb_thread_t *) rb_check_typeddata(current_thread, crashtracker_thread_data_type);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not quite correct, actually. Specifically -- if current_thread is not a thread, this will attempt to raise a ruby-level exception which is not got at all here.

I actually was curious to see what happened so I did this to make it unhappy:

@@ -65,6 +65,8 @@ static frame_info runtime_stack_buffer[512];
 
 long syscall(long number, ...);
 
+static VALUE array;
+
 // align down to power of two
 static inline uintptr_t align_down(uintptr_t x, uintptr_t align) {
   return x & ~(align - 1u);
@@ -175,7 +177,7 @@ static void ruby_runtime_stack_callback(
 
   if (crashtracker_thread_data_type == NULL) return;
 
-  rb_thread_t *th = (rb_thread_t *) rb_check_typeddata(current_thread, crashtracker_thread_data_type);
+  rb_thread_t *th = (rb_thread_t *) rb_check_typeddata(array, crashtracker_thread_data_type);
   if (!th || !is_pointer_readable(th, sizeof(*th))) return;
 
   // Use the profiling helper to gather frames into our static buffer.
@@ -245,6 +247,9 @@ static void ruby_runtime_stack_callback(
 }
 
 void crashtracking_runtime_stacks_init(void) {
+  rb_global_variable(&array);
+  array = rb_ary_new();
+
   if (crashtracker_thread_data_type == NULL) {
     VALUE current_thread = rb_thread_current();
     if (current_thread == Qnil) return;
@@ -258,4 +263,3 @@ void crashtracking_runtime_stacks_init(void) {
   // Register immediately so Ruby doesn't need to manage this explicitly.
   ddog_crasht_register_runtime_frame_callback(ruby_runtime_stack_callback);
 }
-

and it seems like in practice the crashtracker times out.

TL;DR I suggest maybe introducing a is_thread(VALUE) to private_vm_api_access.c (refactoring the code from thread_struct_from_object to be shared maybe?) that uses the rb_typeddata_is_kind_of instead and can be used here.

Why add to private_vm_api_access.c instead of here? I think we're very close to not needing any private headers in this file, and at this point we might as well take the advantages of code sharing with the profiler.

Copy link
Contributor Author

@gyuheon0h gyuheon0h Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome ,thanks for this catch.

I'm not sure we can have it as a helper in private_vm_api_access.c if that means that helper has to fetch ruby thread type by using rb_thread_current(), since that means that code will be triggered post crash, so we will be basically comparing the types of two rb_thread_current() calls post crash, which is meaningless.

I think a nicer approach will be to just rb_typeddata_is_kind_of within the callback, as such

static void ruby_runtime_stack_callback(
  void (*emit_frame)(const ddog_crasht_RuntimeStackFrame*)
) {
  // Grab the Ruby thread we crashed on; crashtracker only runs once.
  VALUE current_thread = rb_thread_current();
  if (current_thread == Qnil) return;

  if (crashtracker_thread_data_type == NULL) return;

  if (!rb_typeddata_is_kind_of(current_thread, crashtracker_thread_data_type)) {
    emit_placeholder_frame(emit_frame, "<runtime stack not found>");
    return;
  }
  
  ........

This way, we are comparing the type of ruby thread pre crash (which we know is valid) to the type of ruby thread post crash (what we want to validate)

I also did the same validation you did with passing in a garbage value, and it does not hang this way ^_^

Comment on lines 195 to 199
const rb_iseq_t *iseq = (const rb_iseq_t *)info->as.ruby_frame.iseq;
const char *function_name = "<unknown>";
const char *file_name = "<unknown>";

if (iseq && is_pointer_readable(iseq, sizeof(rb_iseq_t))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Together with my note on getting rid of rb_thread_t above, it looks to me that knowing about rb_iseq_t is the last thing on this file that needs the internal headers.

But here we actually only need to know about rb_iseq_t for the sizeof check. Thus, my suggestion is -- add a size_of_rb_iseq_t() function to private_vm_api_access.c and use it here, so that we can rid this file of having to care about the internal headers.

E.g. this would become something like

Suggested change
const rb_iseq_t *iseq = (const rb_iseq_t *)info->as.ruby_frame.iseq;
const char *function_name = "<unknown>";
const char *file_name = "<unknown>";
if (iseq && is_pointer_readable(iseq, sizeof(rb_iseq_t))) {
const void *iseq = (const void *)info->as.ruby_frame.iseq;
const char *function_name = "<unknown>";
const char *file_name = "<unknown>";
if (iseq && is_pointer_readable(iseq, size_of_rb_iseq_t())) {

Copy link
Contributor Author

@gyuheon0h gyuheon0h Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! I need to also make helpers in private_vm_api_access.c for rb_iseq_base_label and rb_iseq_path, since those are private

@gyuheon0h gyuheon0h force-pushed the gyuheon0h/prof-12743-runtime-stack-callback branch from e751d03 to 3aae99b Compare December 12, 2025 21:14
@gyuheon0h gyuheon0h requested a review from ivoanjo December 12, 2025 21:17
@gyuheon0h gyuheon0h force-pushed the gyuheon0h/prof-12743-runtime-stack-callback branch from 3aae99b to 0b05d48 Compare December 12, 2025 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Involves Datadog core libraries profiling Involves Datadog profiling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants