Hi, thank you for releasing OpenVLA‑OFT – this repository is extremely helpful.
I have a question about the intended data interface for fine‑tuning, especially regarding datasets that are not converted to RLDS.
Background
I have demonstration data generated with Isaac GR00T (v1.6 / N1.6), structured in a LeRobot v2–style format, roughly as:
. ├─meta │ ├─episodes.jsonl │ ├─modality.json # -> GR00T LeRobot specific │ ├─info.json │ └─tasks.jsonl ├─videos │ └─chunk-000 │ └─observation.images.front │ └─episode_000001.mp4 │ └─episode_000000.mp4 └─data └─chunk-000 ├─episode_000001.parquet └─episode_000000.parquet
Each episode contains:
- RGB images (front camera)
- robot state / proprio (stored in parquet)
- continuous actions
- a task description (language)
This format is compatible with how GR00T / LeRobot v2 store demonstrations.
What I tried
I attempted to fine‑tune OpenVLA‑OFT by:
- writing a custom PyTorch
Dataset / DataLoader
- producing batches shaped like:
observation = { full_image, state, task_description }
actions = (current + future action chunk)
- using action chunking + L1 regression, following
vla-scripts/finetune.py
However, this required significant patching of:
experiments/robot/openvla_utils.py
- assumptions around RLDS,
dataset_statistics.json, and pretrained action heads
At this point I’m unsure whether:
- this approach is supported but undocumented, or
- fine‑tuning is intentionally designed around RLDS only, and non‑RLDS inputs are out of scope.
My question
What is the officially intended / recommended way to fine‑tune OpenVLA‑OFT with datasets that are not in RLDS format?
Specifically:
- Is converting datasets (like GR00T / LeRobot v2) to RLDS considered the expected path?
- Or is there a supported way to plug in a custom loader that outputs observation dicts + action chunks?
- If the latter is possible, is there an example or minimal reference for the expected batch format?
I fully understand if RLDS is the only supported interface – I mainly want to confirm the design intent so I can align with it instead of fighting the codebase.
Thanks again for your work, and for any guidance!
Hi, thank you for releasing OpenVLA‑OFT – this repository is extremely helpful.
I have a question about the intended data interface for fine‑tuning, especially regarding datasets that are not converted to RLDS.
Background
I have demonstration data generated with Isaac GR00T (v1.6 / N1.6), structured in a LeRobot v2–style format, roughly as:
. ├─meta │ ├─episodes.jsonl │ ├─modality.json # -> GR00T LeRobot specific │ ├─info.json │ └─tasks.jsonl ├─videos │ └─chunk-000 │ └─observation.images.front │ └─episode_000001.mp4 │ └─episode_000000.mp4 └─data └─chunk-000 ├─episode_000001.parquet └─episode_000000.parquetEach episode contains:
This format is compatible with how GR00T / LeRobot v2 store demonstrations.
What I tried
I attempted to fine‑tune OpenVLA‑OFT by:
Dataset/DataLoaderobservation = { full_image, state, task_description }actions = (current + future action chunk)vla-scripts/finetune.pyHowever, this required significant patching of:
experiments/robot/openvla_utils.pydataset_statistics.json, and pretrained action headsAt this point I’m unsure whether:
My question
What is the officially intended / recommended way to fine‑tune OpenVLA‑OFT with datasets that are not in RLDS format?
Specifically:
I fully understand if RLDS is the only supported interface – I mainly want to confirm the design intent so I can align with it instead of fighting the codebase.
Thanks again for your work, and for any guidance!