Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 81 additions & 0 deletions 2025-Ascend-Innovation-Contest/S1/MultiModal/YangBros/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# MindNLP 模型优化详细说明 (Qwen2-VL-2B-Instruct & Janus-Pro-7B)

本文档详细记录了针对 Qwen2-VL-2B-Instruct 和 Janus-Pro-7B 模型的关键性能优化点,并附带了相应的核心代码实现。

## 1. Qwen2-VL-2B 模型优化

### 1.1 多模态推理加速:Decode 阶段 (实现整网jit加速)

```

## 1. Qwen2-VL 模型优化

### 1.1 多模态推理加速:Decode 阶段 (实现整网jit加速)

优化痛点: 原始实现可能使用了低效的循环或不兼容动态图的索引方式。

改进方案: 利用 mint.nonzero 获取稀疏索引,并优化索引加法逻辑。

**源码实现** (`utils.py`):

**Python**

@mindspore.jit(jit_level='O1', infer_boost="on", jit_config=mindspore.JitConfig(jit_syntax_level='STRICT'))
def _call_model_forward(model,
inputs_embeds,
input_ids,
position_ids,
cache_position,
past_key_values,
use_cache,
attention_mask,
pixel_values,
pixel_values_videos,
image_grid_thw,
video_grid_thw,
rope_deltas,
return_dict,
):
"""
包装对 self.forward 的调用。

Args:
model_inputs (dict): 包含传递给 forward 方法的输入参数的字典。
**additional_kwargs: 其他需要传递给 forward 方法的关键字参数。

Returns:
模型 forward 方法的返回值。
"""
# 将额外的关键字参数合并到 model_inputs 中(如果需要的话)
# 或者直接传递给 forward
return model.forward(
inputs_embeds=inputs_embeds,
input_ids=input_ids,
position_ids=position_ids,
cache_position=cache_position,
past_key_values=past_key_values,
use_cache=use_cache,
attention_mask=attention_mask,
pixel_values=pixel_values,
pixel_values_videos=pixel_values_videos,
image_grid_thw=image_grid_thw,
video_grid_thw=video_grid_thw,
rope_deltas=rope_deltas,
return_dict=return_dict, )
```

## 最终收益
| model_name | memory_reserved | memory_allocated | avg_prefill_latency | avg_decode_latency |
| :--- | :--- | :--- | :--- | :--- |
| Qwen2-VL-2B-Instruct | 8.589934592 | 7.225426432 | 0.7505903244018555 | 0.06681718111038208 |
| Janus-Pro-7B | 17.179869184 | 15.678765056 | 0.6394170522689819 | 0.049347045421600344 |


## 评测结果

| 评测指标 | 平均得分 |
|---------|---------|
| 峰值显存得分 | 100.0 |
| Prefill时延得分 | 102.1077 |
| Decode时延得分 | 158.1359 |
| **总分** | **120.0812** |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没啥问题,但是我比较好奇, 直接整网jit就生效了吗

Binary file not shown.