Skip to content

Question about fine-tuning OpenVLA‑OFT with non‑RLDS (GR00T / LeRobot v2 style) datasets #152

@fsiken

Description

@fsiken

Hi, thank you for releasing OpenVLA‑OFT – this repository is extremely helpful.

I have a question about the intended data interface for fine‑tuning, especially regarding datasets that are not converted to RLDS.


Background

I have demonstration data generated with Isaac GR00T (v1.6 / N1.6), structured in a LeRobot v2–style format, roughly as:

. ├─meta │ ├─episodes.jsonl │ ├─modality.json # -> GR00T LeRobot specific │ ├─info.json │ └─tasks.jsonl ├─videos │ └─chunk-000 │ └─observation.images.front │ └─episode_000001.mp4 │ └─episode_000000.mp4 └─data └─chunk-000 ├─episode_000001.parquet └─episode_000000.parquet

Each episode contains:

  • RGB images (front camera)
  • robot state / proprio (stored in parquet)
  • continuous actions
  • a task description (language)

This format is compatible with how GR00T / LeRobot v2 store demonstrations.


What I tried

I attempted to fine‑tune OpenVLA‑OFT by:

  • writing a custom PyTorch Dataset / DataLoader
  • producing batches shaped like:
    • observation = { full_image, state, task_description }
    • actions = (current + future action chunk)
  • using action chunking + L1 regression, following vla-scripts/finetune.py

However, this required significant patching of:

  • experiments/robot/openvla_utils.py
  • assumptions around RLDS, dataset_statistics.json, and pretrained action heads

At this point I’m unsure whether:

  • this approach is supported but undocumented, or
  • fine‑tuning is intentionally designed around RLDS only, and non‑RLDS inputs are out of scope.

My question

What is the officially intended / recommended way to fine‑tune OpenVLA‑OFT with datasets that are not in RLDS format?

Specifically:

  1. Is converting datasets (like GR00T / LeRobot v2) to RLDS considered the expected path?
  2. Or is there a supported way to plug in a custom loader that outputs observation dicts + action chunks?
  3. If the latter is possible, is there an example or minimal reference for the expected batch format?

I fully understand if RLDS is the only supported interface – I mainly want to confirm the design intent so I can align with it instead of fighting the codebase.

Thanks again for your work, and for any guidance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions