Support CUDA stream operators in ThunderFX #2761

kiya00 · 2025-11-20T13:45:00Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

Fixes #2332 .

Support CUDA stream operators in ThunderFX
This PR added the patch mentioned in #2332 and a follow-up fix in #2750 (comment)

…g-AI#2332) Co-authored-by: Masaki <[email protected]>

for more information, see https://pre-commit.ci

crcrpar

I'd not be an appropriate one to review this as I wrote part of this.

riccardofelluga

Is there a small repro that we can use to test this PR?

riccardofelluga · 2025-11-21T13:37:54Z

thunder/dynamo/splitter.py

+            example_value = node.meta["example_value"]
+            if isinstance(example_value, torch.cuda.Stream):
+                node.meta["example_value"] = None
+                node.meta["_original_stream_type"] = type(example_value).__name__


What is this variable used for?

I guess this variable is something Torch will check so we manually set it. Maybe @crcrpar knows better about it?

Good question, I tried finding reference to _original_stream_type in PyTorch and found nothing. I think this is a dead-code.

Also, I don't think we need _preprocess_cuda_stream_objects as ATM we want to pass it to the fallback (which should just be able to handle it).

With the above change of removing _preprocess_cuda_stream_objects, I don't think we even need new fallback path for NotImplementedError and AssertionError in LazyInductorModule.

With the above change of removing _preprocess_cuda_stream_objects, I don't think we even need new fallback path for NotImplementedError and AssertionError in LazyInductorModule.

Did you try it with the sglang models, I seem to get the following error when commented out _preprocess_cuda_stream_object when running gpt-oss (same error as @crcrpar met when he added the function)

File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 414, in __init__ raise Exception( Exception: Capture cuda graph failed: backend='<thunder.dynamo.compiler.ThunderCompiler object at 0xffda8e9ab470>' raised: AssertionError: cannot extract sympy expressions from <torch.cuda.Stream device=cuda:0 cuda_stream=0x1552ee20> <class 'torch.cuda.streams.Stream'>

I tried with the new test in the PR not via sglang. In that case, it makes sense to keep _preprocess_cuda_stream_object. Thank you for checking.

thunder/dynamo/utils.py

kiya00 · 2025-11-21T16:10:25Z

Is there a small repro that we can use to test this PR?

Yeah, I add a case to test it

kshitij12345

I think just making sure we cause a split when we encounter torch.cuda.Stream object should let the test case (and other relevant code/model) pass.

kshitij12345 · 2025-11-24T10:15:50Z

thunder/dynamo/splitter.py

+            example_value = node.meta["example_value"]
+            if isinstance(example_value, torch.cuda.Stream):
+                node.meta["example_value"] = None
+                node.meta["_original_stream_type"] = type(example_value).__name__


Good question, I tried finding reference to _original_stream_type in PyTorch and found nothing. I think this is a dead-code.

Also, I don't think we need _preprocess_cuda_stream_objects as ATM we want to pass it to the fallback (which should just be able to handle it).

With the above change of removing _preprocess_cuda_stream_objects, I don't think we even need new fallback path for NotImplementedError and AssertionError in LazyInductorModule.

kshitij12345

LGTM, thanks @kiya00

thunder/dynamo/splitter.py

kiya00 · 2025-11-25T09:55:33Z

Hi @KaelanDt , could you help review this?

kiya00 and others added 2 commits November 20, 2025 13:35

ThunderFX: Add fallback handling for CUDA stream operations (Lightnin…

2cab398

…g-AI#2332) Co-authored-by: Masaki <[email protected]>

fix graphmodule when fallback to torch

52dcc2c

kiya00 requested review from crcrpar and kshitij12345 November 20, 2025 13:45

kiya00 requested review from KaelanDt, lantiga and mruberry as code owners November 20, 2025 13:45

pre-commit-ci bot and others added 2 commits November 20, 2025 13:45

[pre-commit.ci] auto fixes from pre-commit.com hooks

31e8622

for more information, see https://pre-commit.ci

Merge branch 'main' into debug_tmp

a9860e4

crcrpar reviewed Nov 21, 2025

View reviewed changes

kiya00 requested review from mattteochen and riccardofelluga November 21, 2025 13:19

riccardofelluga reviewed Nov 21, 2025

View reviewed changes

add test

cb80636

kshitij12345 reviewed Nov 24, 2025

View reviewed changes

rm unused

7db7b53

kshitij12345 approved these changes Nov 24, 2025

View reviewed changes

thunder/dynamo/splitter.py Show resolved Hide resolved

kiya00 and others added 2 commits November 24, 2025 14:40

fix for review comment

04d80dc

Merge branch 'main' into debug_tmp

26d429d

Support CUDA stream operators in ThunderFX #2761

Are you sure you want to change the base?

Support CUDA stream operators in ThunderFX #2761

Conversation

kiya00 commented Nov 20, 2025

What does this PR do?

Uh oh!

crcrpar left a comment

Choose a reason for hiding this comment

Uh oh!

riccardofelluga left a comment

Choose a reason for hiding this comment

Uh oh!

riccardofelluga Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

kiya00 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

crcrpar Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

kshitij12345 Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kiya00 Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

kshitij12345 Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kiya00 commented Nov 21, 2025

Uh oh!

kshitij12345 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kshitij12345 Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kshitij12345 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kiya00 commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kshitij12345 Nov 24, 2025 •

edited

Loading

kshitij12345 left a comment •

edited

Loading

kshitij12345 Nov 24, 2025 •

edited

Loading