Skip to content

Question: Using the LLaVA-NEXT Dataset for SFT #28

@nihaotian1

Description

@nihaotian1

Hi,
thanks for your great work.

While attempting to reproduce the LLaDA-V ablation study results, I successfully executed scripts/llada_v_pretrain.sh. However, I then encountered several issues in the subsequent steps.

  1. I attempted to increase the per_device_batch_size from 1 to 8 while running bash scripts/train_ablation/llada_v_sft.sh. Although GPU memory usage increased, the total training time remained unchanged at approximately 72 hours with 4 GPUs.

  2. I observed that the grad_norm was reported as 0.0 during training. Is this expected, or could it indicate a problem? (The only modifications I made were to the training scripts; the rest of the original code remains unchanged.)

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions