Skip to content

Commit a07f22e

Browse files
pkooijs1lent4gnt
andauthored
feat(envs): add LIBERO-plus robustness benchmark (huggingface#3313)
* feat(envs): add LIBERO-plus robustness benchmark integration - LiberoPlusEnv config (subclass of LiberoEnv, same gym interface) - Docker image installing LIBERO-plus fork via PYTHONPATH - CI workflow: 1-episode smoke eval with pepijn223/smolvla_libero_plus - pyproject.toml: libero_plus extra * fix(libero): use suite's perturbation-aware init_states loader LIBERO-plus's Benchmark class exposes a `get_task_init_states(i)` method that strips perturbation suffixes (`_table_N`, `_tb_N`, `_view_`, `_language_`, `_light_`, `_add_`, `_level`) and loads the underlying base `.pruned_init` file — the on-disk name for a perturbation variant doesn't exist as a file, only the base does. lerobot's loader was bypassing that logic and trying to read the suffix-bearing filename directly, which failed for every non-zero task id and killed the eval before any rollout video could be written. Delegate to the suite's method when it exists; fall back to the path-based loader for vanilla LIBERO (which does not provide the method). Also drop the hf-libero install + init_files copy from the LIBERO-plus Dockerfile — the LIBERO-plus clone already ships both `bddl_files/` and `init_files/` for all five suites, so the copy was unnecessary and the `cp -r` into an existing dir produced a confusing nested layout. * fix(libero): resolve LIBERO-plus perturbation init_states path ourselves Delegating to `task_suite.get_task_init_states(i)` works for path resolution but LIBERO-plus's method calls `torch.load(path)` without `weights_only=False`, which fails on PyTorch 2.6+ because the pickled init_states contains numpy objects not in the default allowlist: _pickle.UnpicklingError: Weights only load failed. WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global. Mirror LIBERO-plus's suffix-stripping logic (`_table_N`, `_tb_N`, `_view_`, `_language_`, `_light_`, `_add_`, `_level`) in our own helper so we can pass `weights_only=False` ourselves. Vanilla LIBERO task names don't contain any of these patterns except for `_table_` when followed by the word `center` (e.g. `pick_up_the_black_bowl_from_table_center_...`), and the regex requires `_table_\\d+` so semantic uses are preserved. * fix(libero-plus): download perturbation assets from Sylvest/LIBERO-plus LIBERO-plus's bddl_base_domain.py resolves scene XMLs with `os.path.join(DIR_PATH, "../assets")`, so the `assets` key in config.yaml has no effect on scene lookup — MuJoCo always opens `<clone>/libero/libero/assets/scenes/...`. With no such directory present, every perturbation task fails on: FileNotFoundError: No such file or directory: .../libero-plus/libero/libero/assets/scenes/tabletop_table_Cobblestone01_GLOSS_6K.xml These textures, views, and extra objects ship only in the 6.4 GB `assets.zip` published at `Sylvest/LIBERO-plus` (the LIBERO-plus README explicitly says to download and unzip it into the package dir). Fetch it via `hf_hub_download`, unzip into `${LIBERO_PLUS_ROOT}/`, install `unzip`, and point config.yaml at the extracted dir so everything stays consistent. The download lives in its own Docker layer so subsequent rebuilds reuse the cached assets. Drops the lerobot/libero-assets snapshot_download — that mirror only has vanilla LIBERO textures and is ignored for scene loading anyway. * fix(libero-plus): flatten deep path prefix from Sylvest/LIBERO-plus assets.zip The 6.4 GB zip ships with every entry prefixed by `inspire/hdd/project/embodied-multimodality/public/syfei/libero_new/release/dataset/LIBERO-plus-0/assets/...` (the author's internal filesystem layout, not the layout the LIBERO-plus README promises), so the previous `unzip -d ${LIBERO_PLUS_ROOT}/` created `${LIBERO_PLUS_ROOT}/inspire/.../assets/` — robosuite still opened `${LIBERO_PLUS_ROOT}/assets/scenes/tabletop_table_Cobblestone01_GLOSS_6K.xml` and hit the same FileNotFoundError. Extract to a scratch dir, then `mv` the nested `assets/` subtree to the expected location. Verified the target file exists in the zip central directory under that exact prefix. * refactor(libero): inline init_states resolver behind single regex Collapse the three-style suffix stripper (split/re.sub/in) into one compiled regex, drop the (Path, bool) tuple return, and move the `_add_`/`_level` reshape branch into the caller so each branch loads its own file and returns directly. Net: -11 lines, one fewer helper. * refactor(libero-plus): rebase docker image on huggingface/lerobot-gpu Mirror the libero/metaworld/robomme pattern: start from the nightly GPU image (apt deps, python, uv, venv, lerobot[all] already there) and only layer on what LIBERO-plus uniquely needs — its wand/ImageMagick build deps, the non-extra runtime pips (robosuite==1.4.1, bddl, …), the PYTHONPATH-shadowed fork, and the 6.4 GB assets.zip. Drops ~50 lines of duplicated base setup (CUDA FROM, apt python, uv install, user creation, venv init) the nightly already provides. 123 → 73 lines. Also: - Add libero_plus to docs/source/_toctree.yml under Benchmarks so doc-builder's TOC integrity check stops failing. - Repoint the docs dataset link from pepijn223/libero_plus_lerobot to the canonical lerobot/libero_plus. - Revert the stray uv.lock churn (revision/marker diff that crept in from an unrelated resolve — unrelated to LIBERO-plus). * fix(libero-plus): stop touching pyproject + uv.lock The fast-tests job was rejecting the branch because pyproject.toml had a [libero_plus] extra whose git dep wasn't represented in uv.lock. The Docker image no longer needs the extra — it clones LIBERO-plus directly and PYTHONPATH-shadows hf-libero. Drop [libero_plus] from pyproject and restore pyproject.toml + uv.lock to exactly what's on origin/main, so `uv sync --locked --extra test` is a no-op for this PR. Also repoint the doc/CI/env comments that still mentioned the extra at the Docker install path. * fix(libero-plus): strip perturbation metadata from task descriptions LIBERO-plus builds task.language by space-joining the perturbation-variant filename, so every non-_language_ variant inherits a trailing blob like "view 0 0 100 0 0 initstate 0 noise 45" or "add 16". That shows up in the dashboard video labels and no longer matches the base instruction stored in the training dataset. Strip those tokens in extract_task_descriptions.py with an end-anchored regex over the {view,initstate,noise,add,tb,table,light,level}(+digits) vocabulary. The anchor preserves mid-sentence literal uses of those words (e.g. "from table center and place it on the plate") — only the trailing metadata chain is removed. _language_ variants carry real BDDL-sourced text and are left untouched. * ci: point benchmark eval checkpoints at the lerobot/ org mirrors pepijn223/smolvla_* → lerobot/smolvla_* across every benchmark job in this branch (libero, metaworld, and the per-branch benchmark). The checkpoints were mirrored into the lerobot/ org and that's the canonical location going forward. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: integrate PR huggingface#3313 review feedback - docs: fix paper link to arxiv, add benchmark image, add suite descriptions, add LIBERO-plus replacement warning, restructure eval section to match LIBERO doc style, fix policy I/O section, remove false try/except claim - docker: fix shell grouping for hf-libero uninstall, replace hardcoded asset path with dynamic find - ci: add Docker Hub login step, add HF_USER_TOKEN guard on eval step - envs: add is_libero_plus param to get_task_init_states so vanilla LIBERO always takes the simple path * fix(docs): use correct LIBERO-plus teaser image URL * ci(libero-plus): drop redundant hf auth login step The standalone login step ran `hf auth login` in a throwaway `docker run --rm` container, so no credentials persisted. Auth is already performed inside the eval step's container. Removing the redundant step per PR huggingface#3313 review feedback. * fix(envs): preserve AsyncVectorEnv metadata/unwrapped in lazy eval envs Port of huggingface#3416 onto this branch. Without these attributes eval crashes when calling `env.unwrapped.metadata["render_fps"]` with async vector envs. Adds `metadata` / `unwrapped` to `_LazyAsyncVectorEnv` and caches the metadata alongside obs/action spaces in the LIBERO and MetaWorld factories. * ci: gate Docker Hub login on secret availability Fork PRs cannot access `secrets.DOCKERHUB_LEROBOT_{USERNAME,PASSWORD}`, which made every benchmark job fail at the login step before any of the actual build/eval work could run. Gate the login on the env-var expansion of the username so the step is skipped (not failed) when secrets are absent. Mirrors the existing pattern in the VLABench job. * Update .github/workflows/benchmark_tests.yml Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update scripts/ci/extract_task_descriptions.py Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update .github/workflows/benchmark_tests.yml Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update docker/Dockerfile.benchmark.libero_plus Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * Update .github/workflows/benchmark_tests.yml Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co> Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> * fix(libero-plus): address review feedback * ci(libero-plus): fix YAML indentation in upload-artifact steps The `uses:` key on two upload-artifact steps was at column 0 instead of nested under the step, causing `pre-commit run check-yaml` to fail with "expected <block end>, but found '<block mapping start>'". Signed-off-by: Pepijn <138571049+pkooij@users.noreply.github.com> Co-authored-by: Khalil Meftah <khalil.meftah@huggingface.co>
1 parent 282c31c commit a07f22e

7 files changed

Lines changed: 466 additions & 11 deletions

File tree

.github/workflows/benchmark_tests.yml

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -736,3 +736,110 @@ jobs:
736736
name: robomme-metrics
737737
path: /tmp/robomme-artifacts/metrics.json
738738
if-no-files-found: warn
739+
740+
# ── LIBERO-plus ───────────────────────────────────────────────────────────
741+
# Isolated image: LIBERO-plus fork cloned into /home/user_lerobot on top of
742+
# huggingface/lerobot-gpu (see docker/Dockerfile.benchmark.libero_plus).
743+
libero-plus-integration-test:
744+
name: LIBERO-plus — build image + 1-episode eval
745+
runs-on:
746+
group: aws-g6-4xlarge-plus
747+
env:
748+
HF_USER_TOKEN: ${{ secrets.LEROBOT_HF_USER }}
749+
LIBERO_PLUS_SUITE: libero_spatial
750+
LIBERO_PLUS_POLICY: lerobot/smolvla_libero_plus
751+
LIBERO_PLUS_TASK_IDS: "[0,100,260,500,1000,1500,2000,2400]"
752+
753+
steps:
754+
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
755+
with:
756+
persist-credentials: false
757+
lfs: true
758+
759+
- name: Set up Docker Buildx
760+
uses: docker/setup-buildx-action@v3 # zizmor: ignore[unpinned-uses]
761+
with:
762+
cache-binary: false
763+
764+
- name: Login to Docker Hub
765+
if: ${{ env.DOCKERHUB_USERNAME != '' }}
766+
uses: docker/login-action@v3 # zizmor: ignore[unpinned-uses]
767+
with:
768+
username: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}
769+
password: ${{ secrets.DOCKERHUB_LEROBOT_PASSWORD }}
770+
env:
771+
DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_LEROBOT_USERNAME }}
772+
773+
- name: Build LIBERO-plus benchmark image
774+
uses: docker/build-push-action@v6 # zizmor: ignore[unpinned-uses]
775+
with:
776+
context: .
777+
file: docker/Dockerfile.benchmark.libero_plus
778+
push: false
779+
load: true
780+
tags: lerobot-benchmark-libero-plus:ci
781+
cache-from: type=local,src=/tmp/.buildx-cache-libero-plus
782+
cache-to: type=local,dest=/tmp/.buildx-cache-libero-plus,mode=max
783+
784+
- name: Run LIBERO-plus smoke eval (1 episode)
785+
if: env.HF_USER_TOKEN != ''
786+
run: |
787+
docker run --name libero-plus-eval --gpus all \
788+
--shm-size=4g \
789+
-e HF_HOME=/tmp/hf \
790+
-e HF_USER_TOKEN="${HF_USER_TOKEN}" \
791+
-e HF_HUB_DOWNLOAD_TIMEOUT=300 \
792+
-e LIBERO_PLUS_SUITE="${LIBERO_PLUS_SUITE}" \
793+
-e LIBERO_PLUS_POLICY="${LIBERO_PLUS_POLICY}" \
794+
-e LIBERO_PLUS_TASK_IDS="${LIBERO_PLUS_TASK_IDS}" \
795+
lerobot-benchmark-libero-plus:ci \
796+
bash -c "
797+
hf auth login --token \"\$HF_USER_TOKEN\" --add-to-git-credential 2>/dev/null || true
798+
lerobot-eval \
799+
--policy.path=\"\$LIBERO_PLUS_POLICY\" \
800+
--env.type=libero_plus \
801+
--env.task=\"\$LIBERO_PLUS_SUITE\" \
802+
--env.task_ids=\"\$LIBERO_PLUS_TASK_IDS\" \
803+
--eval.batch_size=1 \
804+
--eval.n_episodes=1 \
805+
--eval.use_async_envs=false \
806+
--policy.device=cuda \
807+
'--env.camera_name_mapping={\"agentview_image\": \"camera1\", \"robot0_eye_in_hand_image\": \"camera2\"}' \
808+
--policy.empty_cameras=1 \
809+
--output_dir=/tmp/eval-artifacts
810+
python scripts/ci/extract_task_descriptions.py \
811+
--env libero_plus --task \"\$LIBERO_PLUS_SUITE\" \
812+
--output /tmp/eval-artifacts/task_descriptions.json
813+
"
814+
815+
- name: Copy LIBERO-plus artifacts from container
816+
if: always()
817+
run: |
818+
mkdir -p /tmp/libero-plus-artifacts
819+
docker cp libero-plus-eval:/tmp/eval-artifacts/. /tmp/libero-plus-artifacts/ 2>/dev/null || true
820+
docker rm -f libero-plus-eval || true
821+
822+
- name: Parse LIBERO-plus eval metrics
823+
if: always()
824+
run: |
825+
python3 scripts/ci/parse_eval_metrics.py \
826+
--artifacts-dir /tmp/libero-plus-artifacts \
827+
--env libero_plus \
828+
--task "${LIBERO_PLUS_SUITE}" \
829+
--policy "${LIBERO_PLUS_POLICY}"
830+
831+
- name: Upload LIBERO-plus rollout video
832+
if: always()
833+
uses: actions/upload-artifact@v4 # zizmor: ignore[unpinned-uses]
834+
with:
835+
name: libero-plus-rollout-video
836+
path: /tmp/libero-plus-artifacts/videos/
837+
if-no-files-found: warn
838+
839+
- name: Upload LIBERO-plus eval metrics
840+
if: always()
841+
uses: actions/upload-artifact@v4 # zizmor: ignore[unpinned-uses]
842+
with:
843+
name: libero-plus-metrics
844+
path: /tmp/libero-plus-artifacts/metrics.json
845+
if-no-files-found: warn
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Copyright 2026 The HuggingFace Inc. team. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# Benchmark image for LIBERO-plus integration tests.
16+
# Extends the nightly GPU image (which has lerobot[all]) with the LIBERO-plus
17+
# fork source + its 6.4 GB perturbation assets.
18+
#
19+
# Build: docker build -f docker/Dockerfile.benchmark.libero_plus -t lerobot-benchmark-libero-plus .
20+
# Run: docker run --gpus all --rm lerobot-benchmark-libero-plus lerobot-eval ...
21+
22+
FROM huggingface/lerobot-gpu:latest
23+
ENV MUJOCO_GL=egl
24+
25+
# unzip for the 6.4 GB assets.zip; the rest are LIBERO-plus build-time extras
26+
# (wand / ImageMagick / fontconfig) not in the nightly base.
27+
USER root
28+
RUN apt-get update \
29+
&& apt-get install -y --no-install-recommends \
30+
unzip libexpat1 libfontconfig1-dev libmagickwand-dev \
31+
&& apt-get clean && rm -rf /var/lib/apt/lists/*
32+
USER user_lerobot
33+
34+
# robosuite==1.4.1 is mandatory (the fork uses `single_arm_env` removed in
35+
# v1.5+). The rest are LIBERO-plus runtime deps pulled from its setup.py.
36+
# We install these explicitly instead of via the [libero_plus] extra because
37+
# the extra's `libero @ git+...` dep installs as a namespace package and then
38+
# clone and PYTHONPATH-override it below.
39+
RUN uv pip install --no-cache \
40+
"robosuite==1.4.1" \
41+
"bddl==1.0.1" \
42+
"easydict==1.13" \
43+
"mujoco==3.7.0" \
44+
"matplotlib==3.10.8" \
45+
"Wand==0.6.13" \
46+
"scikit-image==0.25.2" \
47+
"gym==0.26.2"
48+
49+
# Clone LIBERO-plus and make it importable as `libero`. The nightly base has
50+
# hf-libero (10 tasks) preinstalled via lerobot[libero]; uninstall it so
51+
# Python resolves `import libero` to the 2402-task LIBERO-plus module instead.
52+
# Pinned to the current upstream main SHA so benchmark builds stay reproducible.
53+
ARG LIBERO_PLUS_SHA=4976dc3
54+
ENV LIBERO_PLUS_ROOT=/home/user_lerobot/libero-plus/libero/libero
55+
RUN git clone https://github.com/sylvestf/LIBERO-plus.git /home/user_lerobot/libero-plus \
56+
&& git -C /home/user_lerobot/libero-plus checkout ${LIBERO_PLUS_SHA} \
57+
&& cd /home/user_lerobot/libero-plus && uv pip install --no-cache --no-deps -e "." \
58+
&& (uv pip uninstall hf-libero 2>/dev/null || true)
59+
ENV PYTHONPATH="/home/user_lerobot/libero-plus:${PYTHONPATH}"
60+
61+
# Perturbation textures/scenes: bddl_base_domain.py resolves XMLs via
62+
# DIR_PATH/../assets (package-relative, ignoring ~/.libero/config.yaml). All
63+
# 2402 tasks reference files that ship only in Sylvest/LIBERO-plus's
64+
# assets.zip (6.4 GB) under a deep author-internal prefix — extract and
65+
# flatten it under ${LIBERO_PLUS_ROOT}/assets.
66+
RUN python -c "\
67+
from huggingface_hub import hf_hub_download; \
68+
hf_hub_download(repo_id='Sylvest/LIBERO-plus', repo_type='dataset', \
69+
filename='assets.zip', local_dir='/tmp/libero-plus-dl')" \
70+
&& unzip -q /tmp/libero-plus-dl/assets.zip -d /tmp/libero-plus-dl/extract \
71+
&& ASSETS_DIR=$(find /tmp/libero-plus-dl/extract -type d -name assets | head -1) \
72+
&& mv "${ASSETS_DIR}" ${LIBERO_PLUS_ROOT}/assets \
73+
&& rm -rf /tmp/libero-plus-dl
74+
75+
# Point ~/.libero/config.yaml at the clone so LIBERO-plus's imports are
76+
# non-interactive (it calls input() when the config is missing).
77+
RUN mkdir -p /home/user_lerobot/.libero \
78+
&& printf "assets: ${LIBERO_PLUS_ROOT}/assets\nbddl_files: ${LIBERO_PLUS_ROOT}/bddl_files\ndatasets: ${LIBERO_PLUS_ROOT}/../datasets\ninit_states: ${LIBERO_PLUS_ROOT}/init_files\n" \
79+
> /home/user_lerobot/.libero/config.yaml
80+
81+
# Overlay the PR's source code on top of the nightly image.
82+
COPY --chown=user_lerobot:user_lerobot . .
83+
84+
CMD ["/bin/bash"]

docs/source/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,8 @@
7777
title: Adding a New Benchmark
7878
- local: libero
7979
title: LIBERO
80+
- local: libero_plus
81+
title: LIBERO-plus
8082
- local: metaworld
8183
title: Meta-World
8284
- local: robotwin

docs/source/libero_plus.mdx

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
# LIBERO-plus
2+
3+
LIBERO-plus is a **robustness benchmark** for Vision-Language-Action (VLA) models built on top of [LIBERO](./libero). It systematically stress-tests policies by applying **seven independent perturbation dimensions** to the original LIBERO task set, exposing failure modes that standard benchmarks miss.
4+
5+
- Paper: [In-depth Robustness Analysis of Vision-Language-Action Models](https://arxiv.org/abs/2510.13626)
6+
- GitHub: [sylvestf/LIBERO-plus](https://github.com/sylvestf/LIBERO-plus)
7+
- Dataset: [lerobot/libero_plus](https://huggingface.co/datasets/lerobot/libero_plus)
8+
9+
![An overview of the LIBERO-plus benchmark perturbation dimensions](https://github.com/sylvestf/LIBERO-plus/raw/main/static/images/libero-plus.jpg)
10+
11+
## Perturbation dimensions
12+
13+
LIBERO-plus creates ~10 000 task variants by perturbing each original LIBERO task along these axes:
14+
15+
| Dimension | What changes |
16+
| --------------------- | ----------------------------------------------------- |
17+
| Objects layout | Target position, presence of confounding objects |
18+
| Camera viewpoints | Camera position, orientation, field-of-view |
19+
| Robot initial states | Manipulator start pose |
20+
| Language instructions | LLM-rewritten task description (paraphrase / synonym) |
21+
| Light conditions | Intensity, direction, color, shadow |
22+
| Background textures | Scene surface and object appearance |
23+
| Sensor noise | Photometric distortions and image degradation |
24+
25+
## Available task suites
26+
27+
LIBERO-plus covers the same five suites as LIBERO:
28+
29+
| Suite | CLI name | Tasks | Max steps | Description |
30+
| -------------- | ---------------- | ----- | --------- | -------------------------------------------------- |
31+
| LIBERO-Spatial | `libero_spatial` | 10 | 280 | Tasks requiring reasoning about spatial relations |
32+
| LIBERO-Object | `libero_object` | 10 | 280 | Tasks centered on manipulating different objects |
33+
| LIBERO-Goal | `libero_goal` | 10 | 300 | Goal-conditioned tasks with changing targets |
34+
| LIBERO-90 | `libero_90` | 90 | 400 | Short-horizon tasks from the LIBERO-100 collection |
35+
| LIBERO-Long | `libero_10` | 10 | 520 | Long-horizon tasks from the LIBERO-100 collection |
36+
37+
<Tip warning={true}>
38+
Installing LIBERO-plus **replaces** vanilla LIBERO — it uninstalls `hf-libero`
39+
so that `import libero` resolves to the LIBERO-plus fork. You cannot have both
40+
installed at the same time. To switch back to vanilla LIBERO, uninstall the
41+
fork and reinstall with `pip install -e ".[libero]"`.
42+
</Tip>
43+
44+
## Installation
45+
46+
### System dependencies (Linux only)
47+
48+
```bash
49+
sudo apt install libexpat1 libfontconfig1-dev libmagickwand-dev
50+
```
51+
52+
### Python package
53+
54+
```bash
55+
pip install -e ".[libero]" "robosuite==1.4.1" bddl easydict mujoco wand scikit-image gym
56+
git clone https://github.com/sylvestf/LIBERO-plus.git
57+
cd LIBERO-plus && pip install --no-deps -e .
58+
pip uninstall -y hf-libero # so `import libero` resolves to the fork
59+
```
60+
61+
LIBERO-plus is installed from its GitHub fork rather than a pyproject extra — the fork ships as a namespace package that pip can't handle, so it must be cloned and added to `PYTHONPATH`. See `docker/Dockerfile.benchmark.libero_plus` for the canonical install. MuJoCo is required, so only Linux is supported.
62+
63+
<Tip>
64+
Set the MuJoCo rendering backend before running evaluation:
65+
66+
```bash
67+
export MUJOCO_GL=egl # headless / HPC / cloud
68+
```
69+
70+
</Tip>
71+
72+
### Download LIBERO-plus assets
73+
74+
LIBERO-plus ships its extended asset pack separately. Download `assets.zip` from the [Hugging Face dataset](https://huggingface.co/datasets/Sylvest/LIBERO-plus/tree/main) and extract it into the LIBERO-plus package directory:
75+
76+
```bash
77+
# After installing the package, find where it was installed:
78+
python -c "import libero; print(libero.__file__)"
79+
# Then extract assets.zip into <package_root>/libero/assets/
80+
```
81+
82+
## Evaluation
83+
84+
### Default evaluation (recommended)
85+
86+
Evaluate across the four standard suites (10 episodes per task):
87+
88+
```bash
89+
lerobot-eval \
90+
--policy.path="your-policy-id" \
91+
--env.type=libero_plus \
92+
--env.task=libero_spatial,libero_object,libero_goal,libero_10 \
93+
--eval.batch_size=1 \
94+
--eval.n_episodes=10 \
95+
--env.max_parallel_tasks=1
96+
```
97+
98+
### Single-suite evaluation
99+
100+
Evaluate on one LIBERO-plus suite:
101+
102+
```bash
103+
lerobot-eval \
104+
--policy.path="your-policy-id" \
105+
--env.type=libero_plus \
106+
--env.task=libero_spatial \
107+
--eval.batch_size=1 \
108+
--eval.n_episodes=10
109+
```
110+
111+
- `--env.task` picks the suite (`libero_spatial`, `libero_object`, etc.).
112+
- `--env.task_ids` restricts to specific task indices (`[0]`, `[1,2,3]`, etc.). Omit to run all tasks in the suite.
113+
- `--eval.batch_size` controls how many environments run in parallel.
114+
- `--eval.n_episodes` sets how many episodes to run per task.
115+
116+
### Multi-suite evaluation
117+
118+
Benchmark a policy across multiple suites at once by passing a comma-separated list:
119+
120+
```bash
121+
lerobot-eval \
122+
--policy.path="your-policy-id" \
123+
--env.type=libero_plus \
124+
--env.task=libero_spatial,libero_object \
125+
--eval.batch_size=1 \
126+
--eval.n_episodes=10
127+
```
128+
129+
### Control mode
130+
131+
LIBERO-plus supports two control modes — `relative` (default) and `absolute`. Different VLA checkpoints are trained with different action parameterizations, so make sure the mode matches your policy:
132+
133+
```bash
134+
--env.control_mode=relative # or "absolute"
135+
```
136+
137+
### Policy inputs and outputs
138+
139+
**Observations:**
140+
141+
- `observation.state` — 8-dim proprioceptive features (eef position, axis-angle orientation, gripper qpos)
142+
- `observation.images.image` — main camera view (`agentview_image`), HWC uint8
143+
- `observation.images.image2` — wrist camera view (`robot0_eye_in_hand_image`), HWC uint8
144+
145+
**Actions:**
146+
147+
- Continuous control in `Box(-1, 1, shape=(7,))` — 6D end-effector delta + 1D gripper
148+
149+
### Recommended evaluation episodes
150+
151+
For reproducible benchmarking, use **10 episodes per task** across all four standard suites (Spatial, Object, Goal, Long). This gives 400 total episodes and matches the protocol used for published results.
152+
153+
## Training
154+
155+
### Dataset
156+
157+
A LeRobot-format training dataset for LIBERO-plus is available at:
158+
159+
- [lerobot/libero_plus](https://huggingface.co/datasets/lerobot/libero_plus)
160+
161+
### Example training command
162+
163+
```bash
164+
lerobot-train \
165+
--policy.type=smolvla \
166+
--policy.repo_id=${HF_USER}/smolvla_libero_plus \
167+
--policy.load_vlm_weights=true \
168+
--dataset.repo_id=lerobot/libero_plus \
169+
--env.type=libero_plus \
170+
--env.task=libero_spatial \
171+
--output_dir=./outputs/ \
172+
--steps=100000 \
173+
--batch_size=4 \
174+
--eval.batch_size=1 \
175+
--eval.n_episodes=1 \
176+
--eval_freq=1000
177+
```
178+
179+
## Relationship to LIBERO
180+
181+
LIBERO-plus is a drop-in extension of LIBERO:
182+
183+
- Same Python gym interface (`LiberoEnv`, `LiberoProcessorStep`)
184+
- Same camera names and observation/action format
185+
- Same task suite names
186+
- Installs under the same `libero` Python package name (different GitHub repo)
187+
188+
To use the original LIBERO benchmark, see [LIBERO](./libero) and use `--env.type=libero`.

0 commit comments

Comments
 (0)