Skip to content

SkillBlenderLC extends SkillBlender with a language-conditioned reasoning layer powered by LLMs.

Notifications You must be signed in to change notification settings

ajay-vikram/SkillBlenderLC

Repository files navigation

SkillBlenderLC

SkillBlenderLC extends SkillBlender by adding a LLM-conditioned pipeline that can:

  • take a natural-language task,
  • decompose it into subtasks and required skills,
  • automatically generate reward functions for missing skills,
  • refine those rewards using feedback from RL training.

Overview

System Overview

SkillBlenderLC wraps three ideas around the original SkillBlender training stack:

  1. Language-conditioned planningskill_selection.py prompts an LLM with the current skill library (from src/config.py) to decompose a user task into sequential subtasks and skills.
  2. Automatic reward bootstrappingskill_manager.py detects skills that are not in the library and calls reward_generator.py to synthesize reward code using:
    • example reward classes (envs/rewards.py)
    • humanoid environment dynamics (envs/h1_basic.py)
  3. Reward refinement looprun_reward_refinement.py takes RL training metrics and regenerates improved reward functions, enabling self-evolving skill acquisition.

LLM Performance — Skill Selection

Tested on the eight long-horizon tasks from the original SkillBlender paper.

Skill Selection Results

LLM Performance — Reward Generation for Primitive Skill: Squatting

Performance of the robot on a primitive skill Squatting using reward functions generated by different LLMs.


Ground Truth

GPT-4o

Claude-sonnet-4.5

Gemini-2.5-flash

Llama-3.3-70b-instruct

DeepSeek-Chat-V3-0324

Mistral-large

LLM Performance — Reward Generation for Long Range Task: Button Press

Performance of the robot on the task Button Press after learning new skill Pressing using reward functions generated by LLM.


Ground Truth

LLM Generated

LLM Performance — Reward Generation for New Skills

Performance of the robot on new primitives: Bending, Sidestepping and Kicking.


Bend Backwards

Side Stepping

Kicking

Quick Start

  1. Install dependencies

    pip install -r requirements.txt
  2. Configure OpenRouter Create a .env file with your OpenRouter credentials:

    OPENROUTER_API_KEY=...
    OPENROUTER_REFERRER=...
    OPENROUTER_TITLE=SkillBlenderLC
    
  3. Run a task

    python run_task.py \
    --task "Button Press" \
    --description "Walk to the button and press it"

    This generates an experiment folder containing:

    • decomposition.json – structured subtasks
    • reward_<skill>.py – reward file for new skill

    These reward scripts can be dropped directly into the SkillBlender RL training loop.

Repository Layout

  • src/
    • config.py – skill library + prompt templates
    • skill_selection.py – LLM task decomposition
    • skill_manager.py – missing-skill detection +
    • reward_generator.py – reward orchestration + cleaning
    • utils.py – experiment helpers, JSON utilities, text cleaning
  • envs/
    • h1_basic.py – humanoid environment context provided to the LLM
    • rewards.py – example reward classes for in-context learning
  • experiments/ – auto-generated per-task directories
  • assets/ - diagrams and images
  • skillblender/ – original SkillBlender codebase (submodule)
  • tests/ – unit tests for core modules

Experiment Artifacts

Each run folder under experiments/ contains:

  • decomposition.json – user task metadata, structured subtasks and skills(known, missing)
  • reward_<skill>.py – initial reward implementation
  • reward_<skill>_v<x>.py – generated by run_reward_refinement.py after RL feedback loops

Refining Rewards Post-Training

run_reward_refinement.py injects real RL metrics back into the LLM to produce a revised reward file and a JSON summary so you can track iterations.

python run_reward_refinement.py \
  --subgoal "Press the button" \
  --skill "Pushing" \
  --reward-file experiments/buttonpress/reward_pushing.py \
  --feedback-file experiments/buttonpress/metrics_1.json \
  --iteration 1

This command:

  • feeds the original reward plus feedback metrics to the LLM and writes reward_pushing_v1.py
  • records a structured summary (paths, metrics, iteration label) to feedback_pushing_v1.json

Pass only the iteration number. For additional refinements use --iteration 2, --iteration 3, etc., to keep a growing history of reward tweaks per skill.

Notes

  • The pipeline currently assumes OpenRouter access.
  • Customize prompts, skill libraries, or environment snippets by editing src/config.py and envs/ files.

About

SkillBlenderLC extends SkillBlender with a language-conditioned reasoning layer powered by LLMs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published