mindspore-lab · ycf7606 · Dec 5, 2025 · Dec 5, 2025 · qhzhuang · Dec 9, 2025
diff --git a/2025-Ascend-Innovation-Contest/S1/MultiModal/YangBros/README.md b/2025-Ascend-Innovation-Contest/S1/MultiModal/YangBros/README.md
@@ -0,0 +1,81 @@
+# MindNLP 模型优化详细说明 (Qwen2-VL-2B-Instruct & Janus-Pro-7B)
+
+本文档详细记录了针对 Qwen2-VL-2B-Instruct 和 Janus-Pro-7B 模型的关键性能优化点，并附带了相应的核心代码实现。
+
+## 1. Qwen2-VL-2B 模型优化
+
+### 1.1 多模态推理加速：Decode 阶段 (实现整网jit加速)
+
+```
+
+## 1. Qwen2-VL 模型优化
+
+### 1.1 多模态推理加速：Decode 阶段 (实现整网jit加速)
+
+优化痛点: 原始实现可能使用了低效的循环或不兼容动态图的索引方式。
+
+改进方案: 利用 mint.nonzero 获取稀疏索引，并优化索引加法逻辑。
+
+**源码实现** (`utils.py`):
+
+**Python**
+
+@mindspore.jit(jit_level='O1', infer_boost="on", jit_config=mindspore.JitConfig(jit_syntax_level='STRICT'))
+def _call_model_forward(model,
+                        inputs_embeds,
+                        input_ids,
+                        position_ids,
+                        cache_position,
+                        past_key_values,
+                        use_cache,
+                        attention_mask,
+                        pixel_values,
+                        pixel_values_videos,
+                        image_grid_thw,
+                        video_grid_thw,
+                        rope_deltas,
+                        return_dict,
+                        ):
+    """
+    包装对 self.forward 的调用。
+
+    Args:
+        model_inputs (dict): 包含传递给 forward 方法的输入参数的字典。
+        **additional_kwargs: 其他需要传递给 forward 方法的关键字参数。
+
+    Returns:
+        模型 forward 方法的返回值。
+    """
+    # 将额外的关键字参数合并到 model_inputs 中（如果需要的话）
+    # 或者直接传递给 forward
+    return model.forward(
+        inputs_embeds=inputs_embeds,
+        input_ids=input_ids,
+        position_ids=position_ids,
+        cache_position=cache_position,
+        past_key_values=past_key_values,
+        use_cache=use_cache,
+        attention_mask=attention_mask,
+        pixel_values=pixel_values,
+        pixel_values_videos=pixel_values_videos,
+        image_grid_thw=image_grid_thw,
+        video_grid_thw=video_grid_thw,
+        rope_deltas=rope_deltas,
+        return_dict=return_dict, )
+```
+
+## 最终收益
+| model_name | memory_reserved | memory_allocated | avg_prefill_latency | avg_decode_latency |
+| :--- | :--- | :--- | :--- | :--- |
+| Qwen2-VL-2B-Instruct | 8.589934592 | 7.225426432 | 0.7505903244018555 | 0.06681718111038208 |
+| Janus-Pro-7B | 17.179869184 | 15.678765056 | 0.6394170522689819 | 0.049347045421600344 |
+
+
+## 评测结果
+
+| 评测指标 | 平均得分 |
+|---------|---------|
+| 峰值显存得分 | 100.0 |
+| Prefill时延得分 | 102.1077    |
+| Decode时延得分 | 158.1359     |
+| **总分** | **120.0812** |
diff --git a/2025-Ascend-Innovation-Contest/S1/MultiModal/YangBros/patches.zip b/2025-Ascend-Innovation-Contest/S1/MultiModal/YangBros/patches.zip