add magi-compiler support to qwen_image by STwangyingrui · Pull Request #1089 · ModelTC/LightX2V

STwangyingrui · 2026-05-22T09:40:52Z

optimize step1 time(magi compile time): use block graph instead of model graph
add magi-compiler support to qwen_image

…del graph

gemini-code-assist

Code Review

This pull request integrates magi_compiler into the neopp and qwen_image inference pipelines to enable subgraph-based compilation. It introduces a global switch for custom ops, registers several Triton and RoPE kernels as magi custom ops, and refactors transformer decoder layers to support compilation boundaries. Feedback includes concerns regarding the overhead of creating tensors inside hot loops without specifying devices, the global side effects of modifying torch._dynamo.config at the module level, and potential performance regressions from unnecessary tensor cloning in the standard execution path.

gemini-code-assist · 2026-05-22T09:44:01Z

+                cu_seqlens_q=torch.tensor([0, seq_len_q], dtype=torch.int32),
+                cu_seqlens_kv=torch.tensor([0, seq_len_k], dtype=torch.int32),


Creating cu_seqlens tensors inside the _compute_attn method (which is called for every layer) introduces unnecessary overhead. Additionally, these tensors are created on the CPU by default, which may cause device synchronization or host-to-device copies if the attention implementation expects them on the GPU.

Consider creating these tensors once outside the layer loop or explicitly specifying the device.

Suggested change

cu_seqlens_q=torch.tensor([0, seq_len_q], dtype=torch.int32),

cu_seqlens_kv=torch.tensor([0, seq_len_k], dtype=torch.int32),

cu_seqlens_q=torch.tensor([0, seq_len_q], device=query_states.device, dtype=torch.int32),

cu_seqlens_kv=torch.tensor([0, seq_len_k], device=query_states.device, dtype=torch.int32),

gemini-code-assist · 2026-05-22T09:44:02Z

+        _dynamo.config.capture_scalar_outputs = False
+        _dynamo.config.specialize_int = False
+        _dynamo.config.automatic_dynamic_shapes = True


Modifying torch._dynamo.config at the class definition level has global side effects that occur as soon as the module is imported. This can lead to unexpected behavior in other parts of the application that rely on default Dynamo settings.

It is better to set these configurations within the __init__ method or a specific setup function, or use a context manager if supported by the environment.

gemini-code-assist · 2026-05-22T09:44:02Z

+    query = xq.reshape(L, H * D).contiguous().clone()
+    key = xk.reshape(L, H * D).contiguous().clone()


The addition of .clone() here introduces extra memory allocation and copy operations. While this might be necessary to avoid input mutation issues when using magi_compiler custom ops (which often require functional behavior for subgraph boundaries), it results in a performance regression for the standard eager mode execution path.

If mutation is safe for the caller, consider making the clone conditional or using a registration method that supports in-place operations without excessive dispatch overhead.

STwangyingrui added 6 commits May 14, 2026 10:25

add magi_compiler support to neopp

89a60da

Merge branch 'main' into yr/magi_compiler

c84381a

add option of use_magi_compile

d62db69

optimize step1 time(magi compile time): use block graph instead of mo…

d51efde

…del graph

add magi-compiler support to qwen_image

ee2aed0

merge main and fix conflict

d6373d5

gemini-code-assist Bot reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add magi-compiler support to qwen_image#1089

add magi-compiler support to qwen_image#1089
STwangyingrui wants to merge 6 commits into
mainfrom
yr/magi_compiler

STwangyingrui commented May 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		cu_seqlens_q=torch.tensor([0, seq_len_q], dtype=torch.int32),
		cu_seqlens_kv=torch.tensor([0, seq_len_k], dtype=torch.int32),

		query = xq.reshape(L, H * D).contiguous().clone()
		key = xk.reshape(L, H * D).contiguous().clone()

Conversation

STwangyingrui commented May 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant