Question: Using the LLaVA-NEXT Dataset for SFT

Hi, 
thanks for your great work. 

While attempting to reproduce the LLaDA-V ablation study results, I successfully executed `scripts/llada_v_pretrain.sh`. However, I then encountered several issues in the subsequent steps.

1. I attempted to increase the `per_device_batch_size` from 1 to 8 while running bash `scripts/train_ablation/llada_v_sft.sh`. Although GPU memory usage increased, the **total training time remained unchanged** at approximately 72 hours with 4 GPUs. 

2. I observed that the `grad_norm` was reported as 0.0 during training. Is this expected, or could it indicate a problem? (The only modifications I made were to the training scripts; the rest of the original code remains unchanged.)

<img width="2160" height="326" alt="Image" src="https://github.com/user-attachments/assets/a2640b99-13a4-483c-a58a-e4d9f82818f8" />



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Using the LLaVA-NEXT Dataset for SFT #28

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question: Using the LLaVA-NEXT Dataset for SFT #28

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions