[Experimental] Insert `prims.shape` to ensure NumberProxy is materialized for all dynamic shapes #2746

shino16 · 2025-11-18T01:27:34Z

Builds upon #2745. Closes #2677.

The following command runs successfully.

torchrun --nproc-per-node 2 thunder/benchmarks/benchmark_inference.py \
  --num-iterations 10 --mode thunder --thunder-cache "symbolic values"

Co-authored-by: crcrpar <[email protected]>

beverlylytle · 2025-11-18T11:15:50Z

thunder/core/proxies.py

+        shape_attr = None
+        if history is not None:
+            shape_attr = ProvenanceRecord(
+                PseudoInst.LOAD_ATTR, inputs=[copy.copy(history), wrap_const("shape").provenance]
+            )
+
+        def _dim_history(idx: int) -> ProvenanceRecord | None:
+            if shape_attr is None:
+                return None
+            return ProvenanceRecord(PseudoInst.BINARY_SUBSCR, inputs=[shape_attr, wrap_const(idx).provenance])
+


As suggested in another comment, I think this should be pulled out into a separate PR. When that happens...

This is worrisome. The error we are encountering is that history is None, right? I don't like the idea that we are creating a TensorProxy for a tensor that doesn't have a history... It also doesn't feel right that we are propagating that lack of history forward. In what case does the input tensor not have a history?

This is the traceback where a TensorProxy is created without history:

... File "/opt/pytorch/lightning-thunder/thunder/torch/experimental/dtensor_torch_and_prims.py", line 403, in dtensor_from_local_meta res = proxify_dtensor(res) ^^^^^^^^^^^^^^^^^^^^ File "/opt/pytorch/lightning-thunder/thunder/torch/experimental/dtensor_utils.py", line 21, in wrapper return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/opt/pytorch/lightning-thunder/thunder/torch/experimental/dtensor_proxy.py", line 165, in proxify_dtensor local_tensor_proxy = proxy(t, history=history) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/pytorch/lightning-thunder/thunder/core/proxies.py", line 2104, in proxy return tensorproxy(x, name=name, history=history) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/pytorch/lightning-thunder/thunder/core/proxies.py", line 2025, in tensorproxy raise Exception(msg)

So this comes from a DTensor meta function, which does not have to do with provenance tracking. So in such cases, propagating the lack of history should be fine.

Correct me if I'm wrong, but I thought that meta functions were frequently invoked during the tracing and provenance tracking process... Each time a primitive symbol is executed during tracing, it invokes its meta. Or did you mean something else?

In this prim, we've got the argument x. Isn't it a TensorProxy, probably with some history?

My point was that meta functions in general do not return a proxy with history.

lightning-thunder/thunder/core/prims.py

Lines 3817 to 3818 in 4b6272c

def shallow_copy_meta(a: TensorProxy, /) -> TensorProxy:

return TensorProxy(like=a)

Here, Symbols are wrapped by interpreter_needs_wrap, which wraps the output of meta functions in a WrappedValue and attaches ProvenanceRecord here. So, in my understanding, meta functions under Symbol.__call__ and proxies they deal with do not need to know provenance at all.

I was unsure, so I inserted print(f"{x.history = }").

Patch

diff --git a/thunder/torch/experimental/dtensor_torch_and_prims.py b/thunder/torch/experimental/dtensor_torch_and_prims.py index dfb090c4..0b2e1285 100644 --- a/thunder/torch/experimental/dtensor_torch_and_prims.py +++ b/thunder/torch/experimental/dtensor_torch_and_prims.py @@ -388,20 +388,21 @@ if torch.distributed.is_available(): def dtensor_from_local_meta( x, mesh, placements, *, run_check: bool = False, shape: torch.Size | None = None, stride: tuple[int, ...] | None = None, ): + print(f"{x.history = }") res = run_with_fake_tensor( DTensor.from_local, x, mesh, placements, run_check=run_check, shape=shape, stride=stride ) from thunder.torch.experimental.dtensor_proxy import proxify_dtensor

It only printed x.history = None.

I tried thunder.jit(lambda x: x.exp()) and realized that the proxy input of exp's meta function had history. I am not sure why it's different.

The existence of history depends on where in the interpretation/compilation a tensor proxy is created. If a proxy is created as the function is being interpreted (in interpreter.py), we should expect to see histories populated with ProvenanceRecords. These records are connected with the creation of the prologue trace. If a tensorproxy is created in one of the many optimization passes of the compilation stage, there is no need to keep up with creating ProvenanceRecords, and so histories are often empty. From the backtrace you pasted, I can't tell if this error is happening in the interpretation stage where there should be a history, or in the compilation stage where we don't care.

Thank you for your information! I don't remember the entire stack trace, but what I remember is that dtensor_from_local_meta was never called with a TensorProxy with history. I think it is related to how local tensor proxies are created from dtensors.

In my understanding, when the history of a tensor proxy is not recorded, its shape can also be history-less anyway.

beverlylytle · 2025-11-18T11:18:56Z

thunder/executors/nvfuserex_impl.py

 ) -> Any:
    nv_fill_value = getnv(fill_value, fd, lc_to_nv_map)
    nvdtype = lcdtype_to_nvdtype(dtype)
+    nv_shape = getnv(shape, fd, lc_to_nv_map)


Again, the changes in this file should be a separate PR. When that happens...

Should this be nv_shape = getnv(shape, fd, lc_to_nv_map, inline_number=True) as Kshiteej suggests in #2677 (comment)?

Yes, I thought I should first figure out what the inline_number option does. Since we're dealing with metadata here, I have decided to put inline_number=True back.

thunder/core/update_aliases.py

shino16 and others added 7 commits November 17, 2025 09:47

Add type argument to descriptor __get__

edaf6ba

Set tensor shape's history to None if tensor's history is unavailable

ade597e

Skip update_alias numel check if shape is dynamic

d05dac5

Insert prims.shape for dynamically shaped tensors if necessary

bd7a6ed

Cosmetic change

9ed4bcd

Add dynamic shape handling to nvfuser.full

f6a87a0

Co-authored-by: crcrpar <[email protected]>

Temporary patch for proxies with no producer

fee3d7b

shino16 force-pushed the benchmark_inference.py-dynamic-shapes branch 2 times, most recently from 36dee09 to fee3d7b Compare November 18, 2025 02:04

shino16 mentioned this pull request Nov 18, 2025

Symbolic Values support for benchmark_inference.py #2677

Open

shino16 changed the title ~~Complete multi-GPU dynamic shape support of benchmark_inference.py~~ [Experimental] Insert prims.shape to ensure NumberProxy is materialized for all dynamic shapes Nov 18, 2025

beverlylytle reviewed Nov 18, 2025

View reviewed changes

shino16 mentioned this pull request Nov 18, 2025

Fix benchmark_inference.py segfault and superficial errors with multiple devices #2745

Closed

beverlylytle reviewed Nov 18, 2025

View reviewed changes

thunder/core/update_aliases.py Show resolved Hide resolved

shino16 mentioned this pull request Nov 19, 2025

Set TensorProxy shape's history to None if tensor's history is unavailable #2755

Open

Merge branch 'main' into benchmark_inference.py-dynamic-shapes

db012b1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Experimental] Insert `prims.shape` to ensure NumberProxy is materialized for all dynamic shapes #2746

[Experimental] Insert `prims.shape` to ensure NumberProxy is materialized for all dynamic shapes #2746

Uh oh!

shino16 commented Nov 18, 2025 •

edited

Loading

Uh oh!

beverlylytle Nov 18, 2025

Uh oh!

shino16 Nov 18, 2025 •

edited

Loading

Uh oh!

beverlylytle Nov 18, 2025

Uh oh!

shino16 Nov 19, 2025 •

edited

Loading

Uh oh!

shino16 Nov 19, 2025 •

edited

Loading

Uh oh!

beverlylytle Nov 19, 2025

Uh oh!

shino16 Nov 20, 2025

Uh oh!

beverlylytle Nov 18, 2025

Uh oh!

shino16 Nov 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	def shallow_copy_meta(a: TensorProxy, /) -> TensorProxy:
	return TensorProxy(like=a)

[Experimental] Insert prims.shape to ensure NumberProxy is materialized for all dynamic shapes #2746

Are you sure you want to change the base?

[Experimental] Insert prims.shape to ensure NumberProxy is materialized for all dynamic shapes #2746

Uh oh!

Conversation

shino16 commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

beverlylytle Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

shino16 Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

beverlylytle Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

shino16 Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shino16 Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

beverlylytle Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

shino16 Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

beverlylytle Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

shino16 Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Experimental] Insert `prims.shape` to ensure NumberProxy is materialized for all dynamic shapes #2746

[Experimental] Insert `prims.shape` to ensure NumberProxy is materialized for all dynamic shapes #2746

shino16 commented Nov 18, 2025 •

edited

Loading

shino16 Nov 18, 2025 •

edited

Loading

shino16 Nov 19, 2025 •

edited

Loading

shino16 Nov 19, 2025 •

edited

Loading