Conversation
There was a problem hiding this comment.
Code Review
This pull request integrates the Hunyuan3D-2.1 image-to-3D-mesh generation pipeline, introducing new model runners, DiT network components with Mixture-of-Experts (MoE) support, flow-matching schedulers, and VAE-based mesh decoding utilities. It also includes image preprocessing with background removal and a torchvision compatibility fix. The review feedback identifies several implementation issues, including a dictionary iteration bug in the configuration logic, unreachable code in the conditioner, and potential crashes in the image processor. Furthermore, several performance optimizations are recommended for the MoE inference and runner modules to avoid unnecessary device synchronizations and excessive memory management overhead.
| else: | ||
| raise ValueError(f"{config_file} must be `.yaml` file or it contains `base_config` key.") | ||
|
|
||
| config_file = {key: value for key, value in config_file if key != "base_config"} |
There was a problem hiding this comment.
Iterating directly over a DictConfig (or any dict-like object) yields only the keys. Attempting to unpack it as key, value will raise a ValueError. You should use .items() to iterate over key-value pairs.
| config_file = {key: value for key, value in config_file if key != "base_config"} | |
| config_file = {key: value for key, value in config_file.items() if key != "base_config"} |
| if self.disable_drop: | ||
| return outputs | ||
| else: | ||
| random_p = torch.rand(len(image), device="cuda") |
There was a problem hiding this comment.
| outputs = { | ||
| "main": self.main_image_encoder(image, mask=mask, **kwargs), | ||
| } | ||
| return outputs |
| image = Image.open(image_path) | ||
| if image.mode == "RGB" and self.enable_rembg: | ||
| if self.rembg is None: | ||
| self.rembg = BackgroundRemover() | ||
| return self.rembg(image.convert("RGB")) | ||
| return image.convert("RGBA") |
There was a problem hiding this comment.
The image file is opened but not explicitly closed. Using a context manager ensures the file handle is released promptly.
| image = Image.open(image_path) | |
| if image.mode == "RGB" and self.enable_rembg: | |
| if self.rembg is None: | |
| self.rembg = BackgroundRemover() | |
| return self.rembg(image.convert("RGB")) | |
| return image.convert("RGBA") | |
| def __call__(self, image_path: str) -> Image.Image: | |
| with Image.open(image_path) as image: | |
| if image.mode == "RGB" and self.enable_rembg: | |
| if self.rembg is None: | |
| self.rembg = BackgroundRemover() | |
| return self.rembg(image.convert("RGB")) | |
| return image.convert("RGBA") |
| x_min, x_max = coords[0].min(), coords[0].max() | ||
| y_min, y_max = coords[1].min(), coords[1].max() |
There was a problem hiding this comment.
If the mask is all zeros (e.g., if background removal fails or the image is empty), np.nonzero(mask) will return empty arrays. Calling .min() or .max() on an empty array will raise a ValueError. You should check if the mask contains any non-zero elements first.
| x_min, x_max = coords[0].min(), coords[0].max() | |
| y_min, y_max = coords[1].min(), coords[1].max() | |
| coords = np.nonzero(mask) | |
| if coords[0].size == 0: | |
| raise ValueError("input image is empty") | |
| x_min, x_max = coords[0].min(), coords[0].max() |
|
|
||
| expert_cache = torch.zeros_like(flat) | ||
| idxs = flat_topk_idx.argsort() | ||
| tokens_per_expert = flat_topk_idx.bincount(minlength=moe_weights.num_experts).cpu().numpy().cumsum(0) |
| expert_tokens = flat[exp_token_idx] | ||
| expert_out = infer_moe_ffn(moe_weights.experts[expert_idx], expert_tokens) | ||
| expert_out.mul_(flat_topk_weight[idxs[start_idx:end_idx]]) | ||
| expert_cache = expert_cache.to(expert_out.dtype) |
| torch_device_module.empty_cache() | ||
| gc.collect() |
There was a problem hiding this comment.
Calling torch.cuda.empty_cache() and gc.collect() at the end of every encoder run is extremely expensive and can lead to significant performance overhead due to GPU synchronization and memory fragmentation. These should generally be avoided unless there is a specific, verified memory leak that cannot be addressed otherwise.
No description provided.