[feat]: support hy3d-2.1 by Watebear · Pull Request #1088 · ModelTC/LightX2V

Watebear · 2026-05-22T09:30:08Z

No description provided.

gemini-code-assist

Code Review

This pull request integrates the Hunyuan3D-2.1 image-to-3D-mesh generation pipeline, introducing new model runners, DiT network components with Mixture-of-Experts (MoE) support, flow-matching schedulers, and VAE-based mesh decoding utilities. It also includes image preprocessing with background removal and a torchvision compatibility fix. The review feedback identifies several implementation issues, including a dictionary iteration bug in the configuration logic, unreachable code in the conditioner, and potential crashes in the image processor. Furthermore, several performance optimizations are recommended for the MoE inference and runner modules to avoid unnecessary device synchronizations and excessive memory management overhead.

gemini-code-assist · 2026-05-22T09:36:50Z

+        else:
+            raise ValueError(f"{config_file} must be `.yaml` file or it contains `base_config` key.")
+
+        config_file = {key: value for key, value in config_file if key != "base_config"}


Iterating directly over a DictConfig (or any dict-like object) yields only the keys. Attempting to unpack it as key, value will raise a ValueError. You should use .items() to iterate over key-value pairs.

Suggested change

config_file = {key: value for key, value in config_file if key != "base_config"}

config_file = {key: value for key, value in config_file.items() if key != "base_config"}

gemini-code-assist · 2026-05-22T09:36:50Z

+        if self.disable_drop:
+            return outputs
+        else:
+            random_p = torch.rand(len(image), device="cuda")


The device is hardcoded to "cuda". This will fail on other devices (e.g., CPU, MPS) or if the model is on a specific GPU index. It's better to use the device of the input tensor.

Suggested change

random_p = torch.rand(len(image), device="cuda")

random_p = torch.rand(len(image), device=image.device)

gemini-code-assist · 2026-05-22T09:36:51Z

+        outputs = {
+            "main": self.main_image_encoder(image, mask=mask, **kwargs),
+        }
+        return outputs


This block of code is unreachable because the function returns on line 248 or 253.

gemini-code-assist · 2026-05-22T09:36:51Z

+        image = Image.open(image_path)
+        if image.mode == "RGB" and self.enable_rembg:
+            if self.rembg is None:
+                self.rembg = BackgroundRemover()
+            return self.rembg(image.convert("RGB"))
+        return image.convert("RGBA")


The image file is opened but not explicitly closed. Using a context manager ensures the file handle is released promptly.

Suggested change

image = Image.open(image_path)

if image.mode == "RGB" and self.enable_rembg:

if self.rembg is None:

self.rembg = BackgroundRemover()

return self.rembg(image.convert("RGB"))

return image.convert("RGBA")

def __call__(self, image_path: str) -> Image.Image:

with Image.open(image_path) as image:

if image.mode == "RGB" and self.enable_rembg:

if self.rembg is None:

self.rembg = BackgroundRemover()

return self.rembg(image.convert("RGB"))

return image.convert("RGBA")

gemini-code-assist · 2026-05-22T09:36:51Z

+        x_min, x_max = coords[0].min(), coords[0].max()
+        y_min, y_max = coords[1].min(), coords[1].max()


If the mask is all zeros (e.g., if background removal fails or the image is empty), np.nonzero(mask) will return empty arrays. Calling .min() or .max() on an empty array will raise a ValueError. You should check if the mask contains any non-zero elements first.

Suggested change

x_min, x_max = coords[0].min(), coords[0].max()

y_min, y_max = coords[1].min(), coords[1].max()

coords = np.nonzero(mask)

if coords[0].size == 0:

raise ValueError("input image is empty")

x_min, x_max = coords[0].min(), coords[0].max()

gemini-code-assist · 2026-05-22T09:36:51Z

+
+    expert_cache = torch.zeros_like(flat)
+    idxs = flat_topk_idx.argsort()
+    tokens_per_expert = flat_topk_idx.bincount(minlength=moe_weights.num_experts).cpu().numpy().cumsum(0)


Moving data from GPU to CPU (.cpu().numpy()) inside the inference loop causes a device synchronization, which can significantly degrade performance. Consider performing these operations on the GPU using PyTorch operations like torch.cumsum.

gemini-code-assist · 2026-05-22T09:36:51Z

+        expert_tokens = flat[exp_token_idx]
+        expert_out = infer_moe_ffn(moe_weights.experts[expert_idx], expert_tokens)
+        expert_out.mul_(flat_topk_weight[idxs[start_idx:end_idx]])
+        expert_cache = expert_cache.to(expert_out.dtype)


Re-casting expert_cache to expert_out.dtype inside the loop is inefficient as it creates a new tensor in every iteration. It's better to initialize expert_cache with the desired dtype or cast it once outside the loop.

gemini-code-assist · 2026-05-22T09:36:51Z

+        torch_device_module.empty_cache()
+        gc.collect()


Calling torch.cuda.empty_cache() and gc.collect() at the end of every encoder run is extremely expensive and can lead to significant performance overhead due to GPU synchronization and memory fragmentation. These should generally be avoided unless there is a specific, verified memory leak that cannot be addressed otherwise.

gemini-code-assist Bot reviewed May 22, 2026

View reviewed changes

[feat]: support hy3d-2.1

de28501

Watebear force-pushed the hy3d branch from c8a3226 to de28501 Compare May 22, 2026 09:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat]: support hy3d-2.1#1088

[feat]: support hy3d-2.1#1088
Watebear wants to merge 1 commit into
mainfrom
hy3d

Watebear commented May 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	config_file = {key: value for key, value in config_file if key != "base_config"}
	config_file = {key: value for key, value in config_file.items() if key != "base_config"}

	random_p = torch.rand(len(image), device="cuda")
	random_p = torch.rand(len(image), device=image.device)

		x_min, x_max = coords[0].min(), coords[0].max()
		y_min, y_max = coords[1].min(), coords[1].max()

Conversation

Watebear commented May 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant