SkillBlenderLC

SkillBlenderLC extends SkillBlender by adding a LLM-conditioned pipeline that can:

take a natural-language task,
decompose it into subtasks and required skills,
automatically generate reward functions for missing skills,
refine those rewards using feedback from RL training.

System Overview

SkillBlenderLC wraps three ideas around the original SkillBlender training stack:

Language-conditioned planning – skill_selection.py prompts an LLM with the current skill library (from src/config.py) to decompose a user task into sequential subtasks and skills.
Automatic reward bootstrapping – skill_manager.py detects skills that are not in the library and calls reward_generator.py to synthesize reward code using:
- example reward classes (envs/rewards.py)
- humanoid environment dynamics (envs/h1_basic.py)
Reward refinement loop – run_reward_refinement.py takes RL training metrics and regenerates improved reward functions, enabling self-evolving skill acquisition.

LLM Performance — Skill Selection

Tested on the eight long-horizon tasks from the original SkillBlender paper.

LLM Performance — Reward Generation for Primitive Skill: Squatting

Performance of the robot on a primitive skill Squatting using reward functions generated by different LLMs.

Ground Truth	GPT-4o	Claude-sonnet-4.5	Gemini-2.5-flash
Llama-3.3-70b-instruct	DeepSeek-Chat-V3-0324	Mistral-large

LLM Performance — Reward Generation for Long Range Task: Button Press

Performance of the robot on the task Button Press after learning new skill Pressing using reward functions generated by LLM.

Ground Truth

LLM Generated

LLM Performance — Reward Generation for New Skills

Performance of the robot on new primitives: Bending, Sidestepping and Kicking.

Bend Backwards

Side Stepping

Kicking

Quick Start

Install dependencies
```
pip install -r requirements.txt
```

Configure OpenRouter Create a .env file with your OpenRouter credentials:

OPENROUTER_API_KEY=...
OPENROUTER_REFERRER=...
OPENROUTER_TITLE=SkillBlenderLC

Run a task
```
python run_task.py \
--task "Button Press" \
--description "Walk to the button and press it"
```
This generates an experiment folder containing:
- decomposition.json – structured subtasks
- reward_<skill>.py – reward file for new skill
These reward scripts can be dropped directly into the SkillBlender RL training loop.

Repository Layout

src/
- config.py – skill library + prompt templates
- skill_selection.py – LLM task decomposition
- skill_manager.py – missing-skill detection +
- reward_generator.py – reward orchestration + cleaning
- utils.py – experiment helpers, JSON utilities, text cleaning
envs/
- h1_basic.py – humanoid environment context provided to the LLM
- rewards.py – example reward classes for in-context learning
experiments/ – auto-generated per-task directories
assets/ - diagrams and images
skillblender/ – original SkillBlender codebase (submodule)
tests/ – unit tests for core modules

Experiment Artifacts

Each run folder under experiments/ contains:

decomposition.json – user task metadata, structured subtasks and skills(known, missing)
reward_<skill>.py – initial reward implementation
reward_<skill>_v<x>.py – generated by run_reward_refinement.py after RL feedback loops

Refining Rewards Post-Training

run_reward_refinement.py injects real RL metrics back into the LLM to produce a revised reward file and a JSON summary so you can track iterations.

python run_reward_refinement.py \
  --subgoal "Press the button" \
  --skill "Pushing" \
  --reward-file experiments/buttonpress/reward_pushing.py \
  --feedback-file experiments/buttonpress/metrics_1.json \
  --iteration 1

This command:

feeds the original reward plus feedback metrics to the LLM and writes reward_pushing_v1.py
records a structured summary (paths, metrics, iteration label) to feedback_pushing_v1.json

Pass only the iteration number. For additional refinements use --iteration 2, --iteration 3, etc., to keep a growing history of reward tweaks per skill.

Notes

The pipeline currently assumes OpenRouter access.
Customize prompts, skill libraries, or environment snippets by editing src/config.py and envs/ files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SkillBlenderLC

System Overview

LLM Performance — Skill Selection

LLM Performance — Reward Generation for Primitive Skill: Squatting

LLM Performance — Reward Generation for Long Range Task: Button Press

LLM Performance — Reward Generation for New Skills

Quick Start

Repository Layout

Experiment Artifacts

Refining Rewards Post-Training

Notes

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
assets		assets
envs		envs
experiments/buttonpress		experiments/buttonpress
skillblender		skillblender
src		src
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_reward_refinement.py		run_reward_refinement.py
run_task.py		run_task.py

ajay-vikram/SkillBlenderLC

Folders and files

Latest commit

History

Repository files navigation

SkillBlenderLC

System Overview

LLM Performance — Skill Selection

LLM Performance — Reward Generation for Primitive Skill: Squatting

LLM Performance — Reward Generation for Long Range Task: Button Press

LLM Performance — Reward Generation for New Skills

Quick Start

Repository Layout

Experiment Artifacts

Refining Rewards Post-Training

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages