SkillBlenderLC extends SkillBlender by adding a LLM-conditioned pipeline that can:
- take a natural-language task,
- decompose it into subtasks and required skills,
- automatically generate reward functions for missing skills,
- refine those rewards using feedback from RL training.
SkillBlenderLC wraps three ideas around the original SkillBlender training stack:
- Language-conditioned planning –
skill_selection.pyprompts an LLM with the current skill library (fromsrc/config.py) to decompose a user task into sequential subtasks and skills. - Automatic reward bootstrapping –
skill_manager.pydetects skills that are not in the library and callsreward_generator.pyto synthesize reward code using:- example reward classes (
envs/rewards.py) - humanoid environment dynamics (
envs/h1_basic.py)
- example reward classes (
- Reward refinement loop –
run_reward_refinement.pytakes RL training metrics and regenerates improved reward functions, enabling self-evolving skill acquisition.
Tested on the eight long-horizon tasks from the original SkillBlender paper.
Performance of the robot on a primitive skill Squatting using reward functions generated by different LLMs.
![]() Ground Truth |
![]() GPT-4o |
![]() Claude-sonnet-4.5 |
![]() Gemini-2.5-flash |
![]() Llama-3.3-70b-instruct |
![]() DeepSeek-Chat-V3-0324 |
![]() Mistral-large |
Performance of the robot on the task Button Press after learning new skill Pressing using reward functions generated by LLM.
![]() Ground Truth |
![]() LLM Generated |
Performance of the robot on new primitives: Bending, Sidestepping and Kicking.
![]() Bend Backwards |
![]() Side Stepping |
![]() Kicking |
-
Install dependencies
pip install -r requirements.txt
-
Configure OpenRouter Create a
.envfile with your OpenRouter credentials:OPENROUTER_API_KEY=... OPENROUTER_REFERRER=... OPENROUTER_TITLE=SkillBlenderLC -
Run a task
python run_task.py \ --task "Button Press" \ --description "Walk to the button and press it"
This generates an experiment folder containing:
decomposition.json– structured subtasksreward_<skill>.py– reward file for new skill
These reward scripts can be dropped directly into the SkillBlender RL training loop.
src/config.py– skill library + prompt templatesskill_selection.py– LLM task decompositionskill_manager.py– missing-skill detection +reward_generator.py– reward orchestration + cleaningutils.py– experiment helpers, JSON utilities, text cleaning
envs/h1_basic.py– humanoid environment context provided to the LLMrewards.py– example reward classes for in-context learning
experiments/– auto-generated per-task directoriesassets/- diagrams and imagesskillblender/– original SkillBlender codebase (submodule)tests/– unit tests for core modules
Each run folder under experiments/ contains:
decomposition.json– user task metadata, structured subtasks and skills(known, missing)reward_<skill>.py– initial reward implementationreward_<skill>_v<x>.py– generated byrun_reward_refinement.pyafter RL feedback loops
run_reward_refinement.py injects real RL metrics back into the LLM to produce a revised reward file and a JSON summary so you can track iterations.
python run_reward_refinement.py \
--subgoal "Press the button" \
--skill "Pushing" \
--reward-file experiments/buttonpress/reward_pushing.py \
--feedback-file experiments/buttonpress/metrics_1.json \
--iteration 1This command:
- feeds the original reward plus feedback metrics to the LLM and writes
reward_pushing_v1.py - records a structured summary (paths, metrics, iteration label) to
feedback_pushing_v1.json
Pass only the iteration number. For additional refinements use --iteration 2, --iteration 3, etc., to keep a growing history of reward tweaks per skill.
- The pipeline currently assumes OpenRouter access.
- Customize prompts, skill libraries, or environment snippets by editing
src/config.pyandenvs/files.













