Skip to content

tongjingqi/Awesome-Agent-RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

103 Commits
 
 
 
 

Repository files navigation

Awesome-Agent-Reward-Construction

image

A curated list of awesome resources about reward construction for AI agents. This repository covers cutting-edge research, and practical guides on defining and collecting rewards to build more intelligent and aligned AI agents.

Table of Contents

Introduction

What is Reward Construction?

Reward construction is the process of designing and collecting reward signals that guide AI agents toward desired behaviors and outcomes.

Why is Reward Construction Important?

Background: The Second Half & Era of Experience

The Second Half: Transitioning from creating new methods and models to defining new tasks

  • First Half Focus: Exam-like tasks with universal methods (next token prediction, RL) and architectures (Transformer, GPT)
  • Turning Point: Organic combination of universal methods and architectures, where RL on large models achieves generalization
  • Second Half Focus: Project-based scenarios with multi-turn interactions and temporal learning

Era of Experience: Large Models + Reinforcement Learning = General Superhuman Agents

  • Previous Era: Human Data Era with limitations of human-generated data and capabilities
  • Current Opportunity: Combining self-discovery capabilities with task generality from the human data era
  • Key Components: Environmental rewards, autonomous interaction, continuous experience streams, non-human planning and reasoning

In conclusion, reward construction provides interactive environments and learning signals. It becomes crucial for AI agent to get experience for new project. We divided Reward Construction research into 5 categories, including Synthesizing Verifiable Task, Real-World Task Reward Construction, Unsupervised Reward Construction, Reward Model and Evaluation and Benchmarks.

Synthesizing Verifiable Task

Scaling task quantities through constructing new verifiable task gyms, such as puzzles, games. Training agents to solve these task can enhance model general reasoning capabilities. We divided Reward Construction research to 4 types, including Multi-Modal Reasoning, Text-Based Puzzle Solving, Zero-Sum Games, Converting Open-Domain Tasks to Verifiable Tasks and Curriculum Learning.

Multi-Modal Reasoning

Text-Based Puzzle Solving

Zero-Sum Games

Converting General Tasks to Verifiable Tasks

Transforming general tasks which usually trained through Pretraining and SFT, into RL-compatible formats.

Curriculum Learning

Scaling difficuity of task through curriculum learning, converting sparse reward to dense reward.

Real-World Task Reward Construction

Design reward function and synthesis data to scale up the quantities of the real-world reward. We divided Real-World Task Reward Construction research into 4 types, including Web Search, GUI, VLA and World Model.

Web Search

GUI

Embodied AI & Vision-Language-Action Model

World Model

Towards future: Using world models and real-world interactions for reward construction.

Unsupervised Reward Construction

Finding reward signals from model internals. Model generates data to train itself. We divided Unsupervised Reward Construction into 2 types, including Proposer and Solver and the discussion of can Large Reasoning Models Self-Train.

Proposer and Solver

Models simultaneously act as problem proposers and solution generators, creating new training data.

Internal Signal Mining

Extracting learning signals from model internals, confidence scores, and consistant behaviors without external validation or verification mechanisms.

Reward Model Construction

Scaling preference data for reward models training to enable policy learning on general tasks.

Generative Reward Model

Reward Model Pretrain

Multi-Modal Reward Models

Process Supervision

Evaluation and Benchmarks

Providing benchmarks or gyms to evaluate the model proformance. We divided Evaluation and Benchmarks into 4 types: Reward Model Benchmarks, Game Gym, Web Search, Computer Use and New Evaluation Dimension.

Reward Model Benchmarks

Game Gym

Web Search Evaluation

Computer Use Evaluation

Limitation and Future Work

  • Game data isn't utilized thoroughly. Games are shown effective to enhance model's general abilities, but open-source models rarely include games as training data.
  • Using World Model to construct rewards. World models are capable of generating an endless variety of action-controllable, playable 3D environments for training and evaluating embodied agents.
  • Evaluating interactable environments. A method to evaluate interactable environments, find high quality ones and choose environments fitting policy model's ability level would boost training a lot.

Contributing

We welcome contributions to this repository! Please feel free to:

  1. Submit pull requests to add new papers
  2. Improve paper categorization and descriptions
  3. Add implementation details or code repositories
  4. Suggest new categories or reorganization

When adding papers, please include:

  • Paper title and authors
  • Brief description of the reward construction method
  • Key contributions and results
  • Links to paper and code (if available)

Citation

If you find this repository useful, please consider citing:

@misc{awesome-agent-reward,
  title={Awesome Agent Reward: Reward Construction for AI Agents},
  author={[Jingqi Tong, Yurong Mou, Jun Zhao, Hangcheng Li, Yongzhuo Yang, Mingzhe Li, Zhangye Yin]},
  year={2025},
  url={https://github.com/tongjingqi/Awesome-Agent-Reward}
}

Note: This is a living document that will be continuously updated as the field of agent reward construction evolves. Stay tuned for the latest developments!

About

A curated list of awesome resources about reward construction for AI agents. This repository covers cutting-edge research, and practical guides on defining and collecting rewards to build more intelligent and aligned AI agents.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors