Skip to content

Add Cosmos3-Nano LIBERO-10 action-policy SFT recipe, config, eval harness, and doc#61

Merged
fwd4 merged 17 commits into
NVIDIA:mainfrom
fwd4:haolia/libero-action-policy-sft
Jul 2, 2026
Merged

Add Cosmos3-Nano LIBERO-10 action-policy SFT recipe, config, eval harness, and doc#61
fwd4 merged 17 commits into
NVIDIA:mainfrom
fwd4:haolia/libero-action-policy-sft

Conversation

@fwd4

@fwd4 fwd4 commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

What

Adds the Cosmos3-Nano LIBERO-10 action-policy SFT surface, mirroring the existing DROID counterpart (action_policy_droid_nano + toml + launcher + doc).

Feature (net-new)

  • Experiment configs action_policy_libero_nano (libero_10-only) and action_policy_libero_all_nano (equal 4-suite mix) — gen + action heads from the public Cosmos3-Nano base.
  • Dataset LIBEROLeRobotDataset + get_action_libero_sft_dataset — frame_wise_relative rot6d, quantile_rot, concat_view (third-person + wrist), 20 fps.
    • base_dataset tasks.parquet fallback for community LIBERO layouts.
    • Resample-on-decode-failure guard so one undecodable packed-mp4 frame can't crash a multi-node run (matches i4 behavior).
  • Closed-loop eval harness with vectorized sim, batched /predict_batch, single-rank no_dist checkpoint load.
  • Structured-prompt serving in the policy server (--format-prompt-as-json), so eval matches the training prompt format; the recipe defaults to it.

Recipe + doc — two presets (to match the Cosmos3 LIBERO-10 result)

Both lr 5e-5, warmup 500, cycle 16000, global batch 2048 (HSDP 2x8):

  • (A) libero_10-onlyaction_policy_libero_repro.toml + launch_sft_action_policy_libero.sh (max_iter 2000).
  • (B) libero-all (4-suite equal mix)action_policy_libero_all_repro.toml + launch_sft_action_policy_libero_all.sh (max_iter 5000; LIBERO_ROOT = LIBERO_LeRobot_v3 parent dir).
  • docs/action_policy_libero_sft.md documents both.

Notes

  • Scoped to LIBERO only; broader action-dataloader/model changes are intentionally not included here.
  • Based on main.

🤖 Generated with Claude Code

fwd4 and others added 9 commits June 26, 2026 20:59
…ness, and doc

Mirrors the DROID action-policy counterpart (action_policy_droid_nano + repro
toml + launch + doc). Net-new LIBERO feature:
- experiment config: action_policy_libero_nano
- dataset: LIBEROLeRobotDataset + get_action_libero_sft_dataset (frame_wise_relative
  rot6d, quantile_rot, concat_view, 20fps); base_dataset tasks.parquet fallback for
  community LIBERO layouts; resample-on-decode-failure guard (matches i4 behavior)
- closed-loop eval harness (vectorized sim) + batched /predict_batch inference path
  + single-rank no_dist checkpoint load for the policy server
- canonical recipe action_policy_libero_repro.toml + launch_sft_action_policy_libero.sh
  (lr 5e-5, warmup 500, cycle 16000, global batch 2048; ~95% libero_10 500-ep eval)
- docs/action_policy_libero_sft.md

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lean the toml/config/launch/doc comments (drop SR numbers and experimental
detail), and set the canonical recipe to HSDP 2x8 with grad_accum=1 (global
batch 2048) instead of single-node grad_accum=2.
- action_sft_dataset.py: rebuild as origin/main + libero-only (drop the speedup-era
  ShardedDROIDLeRobotDataset import that broke config load on a clean main).
- remove dataset_reply_action_server.py (GT-replay debug tool, not part of the recipe).
- drop DROID/LoRA references from libero docstrings/comments/doc/launch.
@fwd4 fwd4 force-pushed the haolia/libero-action-policy-sft branch from bd18351 to 4d351dd Compare June 29, 2026 03:18
xlu451
xlu451 previously approved these changes Jun 30, 2026
fwd4 and others added 5 commits July 1, 2026 08:47
…on training

Reuse ActionPromptJsonFormatter at serve time so checkpoints trained with
format_prompt_as_json=True receive the same structured JSON prompt at eval.
Adds a --format-prompt-as-json CLI override (None reads the experiment config).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Plumb format_prompt_as_json through get_action_libero_sft_dataset to the
ActionTransformPipeline, and default the action_policy_libero_nano recipe to
structured JSON prompts (set False for plain text).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Equal 1:1:1:1 mix over libero_spatial/object/goal/10 on the public Cosmos3-Nano
base with JSON prompts; LIBERO_ROOT is the LIBERO_LeRobot_v3 parent dir. Same
recipe as action_policy_libero_nano otherwise.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… (B)

Both lr 5e-5 / warmup 500 / cycle 16000 / gbs 2048 / JSON prompts. Preset A
(libero_10-only, max_iter 2000) peaks ~1500. Preset B (libero-all equal 4-suite,
max_iter 5000) is coverage-limited but reaches the best libero_10 SR (~95.6%).
Adds the libero-all TOML + launcher; doc documents both.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fixes pre-commit check-shebang-scripts-are-executable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@fwd4 fwd4 requested a review from pengcuo July 2, 2026 06:09
pengcuo
pengcuo previously approved these changes Jul 2, 2026
lfengad
lfengad previously approved these changes Jul 2, 2026
fwd4 and others added 2 commits July 2, 2026 15:22
…policy-sft

# Conflicts:
#	cosmos_framework/data/vfm/action/datasets/__init__.py
#	cosmos_framework/data/vfm/action/datasets/base_dataset.py
#	cosmos_framework/data/vfm/action/normalizer_stats/libero_native_frame_wise_relative_rot6d.json
Merged origin/main (i4 dataset port renamed stats/ -> normalizer_stats/ and
rewrote base_dataset). Point libero's normalizer dir + doc at normalizer_stats/;
ruff import cleanup; realign doc table.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@fwd4 fwd4 dismissed stale reviews from lfengad and pengcuo via 8e9f8f3 July 2, 2026 07:23
@fwd4 fwd4 merged commit 33ea6a7 into NVIDIA:main Jul 2, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants