使用 `finetuning_type: full` 和 `template: qwen2_vl` 全量微调 Qwen2.5-VL 模型后，推理阶段输出异常或加载失败




---

## 环境信息 (Environment)
- **LlamaFactory 版本**: 0.8.3  
- **Transformers 版本**: 4.52.3  
- **PyTorch 版本**: 2.1.0  
- **Python 版本**: 3.10  
- **CUDA 版本**: 11.8  
- **操作系统**: Linux  
- **模型**: Qwen/Qwen2.5-VL-3B-Instruct  

---

##  训练命令与配置 (Training Command & Config)

执行命令：
```bash
export CUDA_VISIBLE_DEVICES=2,3
llamafactory-cli train train/Qwen2_5-VL/example/3B_full_QA_train_bs8_b2.yaml
```

配置文件内容：
```yaml
bf16: true
cutoff_len: 4096
dataset: QA
ddp_timeout: 180000000
deepspeed: train/utils/deepspeed/ds_z3_config.json
do_train: true
eval_steps: 100
eval_strategy: 'no'
finetuning_type: full
gradient_accumulation_steps: 8
image_max_pixels: 262144
learning_rate: 5.0e-06
logging_steps: 10
lr_scheduler_type: cosine
model_name_or_path: /jfs/auto.prod.sz/users/bak/Qwen2.5-VL-3B-Instruct
num_train_epochs: 2.0
output_dir: ./output/test_run_10
overwrite_cache: true
overwrite_output_dir: false
per_device_eval_batch_size: 1
per_device_train_batch_size: 1
plot_loss: true
preprocessing_num_workers: 32
save_steps: 10
stage: sft
template: qwen2_vl
val_size: 0.01
warmup_ratio: 0.03
max_steps: 10
```

训练过程正常，`loss` 健康下降，模型成功保存至 `output_dir`。

---

## 推理问题 (Inference Problem)

训练完成后，使用 **output_dir** 中的模型直接进行推理（测试了 **SGLang** 与 **VLLM** 两种方式），结果如下：

### 使用 SGLang 推理  
命令：
```bash
python train/inference_scripts/sglang_infer.py \
  --model_name_or_path ./output/test_run_4 \
  --dataset QA \
  --save_name ./output/reference/test/final_test_result_run6.jsonl \
  --template qwen2_vl \
  --tensor_parallel_size 1 \
  --max_samples 2
```

表现：
- 模型加载正常，但输出大量重复的无效字符（如 `!!!!`）。  
- 图像 base64 数据已正确加载并传入，但模型似乎无法处理图像输入。  

推理脚本节选：
```python
obj = GenerateReqInput(
    input_ids=[input_data["prompt_token_ids"]],
    image_data=[input_data["multi_modal_data"]],
    sampling_params=sampling_params,
)
generator = llm.tokenizer_manager.generate_request(obj, None)
ret = loop.run_until_complete(generator.__anext__())
```

---

### 使用 VLLM 推理  
命令：
```bash
python train/inference_scripts/vllm_infer.py \
  --model_name_or_path ./output/test_run_4 \
  --dataset QA \
  --save_name ./output/reference/test/vllm_test_result.jsonl \
  --template qwen2_vl \
  --tensor_parallel_size 1 \
  --max_samples 2
```

结果：
```
KeyError: 'language_model.embed_tokens.weight'
```

模型加载直接失败。

---

##对比
使用相同推理脚本与数据，加载**原始基础模型**：
```bash
python train/inference_scripts/sglang_infer.py \
  --model_name_or_path /jfs/auto.prod.sz/users/bak/Qwen2.5-VL-3B-Instruct \
  --dataset QA \
  --save_name ./output/reference/base_model_inference_result.jsonl \
  --template qwen2_vl \
  --tensor_parallel_size 1 \
  --max_samples 2
```

 输出完全正常，能正确理解图像并生成合理回答。

---

##  可能原因 

怀疑 `llamafactory-cli train` 在执行全量微调 (`finetuning_type: full`) 多模态模型时，  
保存到 `output_dir` 的模型文件（权重或配置）**存在损坏或不完整问题**。

表现：
- 权重结构异常（VLLM 报错 KeyError）；
- 模型视觉分支可能未正确保存（SGLang 输出无效字符）；
- 替换基础模型的 `config.json` 无法修复；
- 问题似乎出现在保存逻辑阶段，而非推理或模板。

---

##  Summary

| 模型来源 | 推理脚本 | 结果 |
|-----------|-----------|------|
| 原始 Qwen2.5-VL-3B-Instruct | SGLang / VLLM |  正常 |
| 全量微调模型 (`finetuning_type: full`) | SGLang |  输出 `!!!!` |
| 全量微调模型 (`finetuning_type: full`) | VLLM | ❌ 报错 `KeyError: 'language_model.embed_tokens.weight'` |

---



---

 **说明**：  
训练日志正常，推理脚本在加载原始模型时完全可用，我目前将问题定位于 **`output_dir` 中模型的保存阶段**。
## Request
请问是训练出错了吗？最后推理不成功啊，训练阶段的loss下降正常，为啥到推理就不行了啊

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用 `finetuning_type: full` 和 `template: qwen2_vl` 全量微调 Qwen2.5-VL 模型后，推理阶段输出异常或加载失败 #48

环境信息 (Environment)

训练命令与配置 (Training Command & Config)

推理问题 (Inference Problem)

使用 SGLang 推理

使用 VLLM 推理

可能原因

Summary

Request

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

模型来源	推理脚本	结果
原始 Qwen2.5-VL-3B-Instruct	SGLang / VLLM	正常
全量微调模型 (`finetuning_type: full`)	SGLang	输出 `!!!!`
全量微调模型 (`finetuning_type: full`)	VLLM	❌ 报错 `KeyError: 'language_model.embed_tokens.weight'`

使用 finetuning_type: full 和 template: qwen2_vl 全量微调 Qwen2.5-VL 模型后，推理阶段输出异常或加载失败 #48

Description

环境信息 (Environment)

训练命令与配置 (Training Command & Config)

推理问题 (Inference Problem)

使用 SGLang 推理

使用 VLLM 推理

可能原因

Summary

Request

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

使用 `finetuning_type: full` 和 `template: qwen2_vl` 全量微调 Qwen2.5-VL 模型后，推理阶段输出异常或加载失败 #48