⚡🧠🏋️ Realtime Reasoning Gym

Realtime Reasoning Gym is a specialized evaluation framework for testing how well language agents can reason and make decisions under real-time constraints. Unlike traditional OpenAI Gym environments where agents have unlimited thinking time, Realtime Reasoning Gym enforces strict time budgets (measured in seconds) or token budgets (measured in LLM decoding tokens) to simulate real-world pressure.

Furthermore, each environment in the gym offers multiple cognitive load levels that vary the intellectual complexity of tasks, enabling a comprehensive assessment of whether agents can balance decision quality and speed when facing different levels of cognitive load and time pressure.

Upper: We create three real-time games, Freeway, Snake, and Overcooked, to study the challenge of real-time reasoning. Lower: Under cognitive load and time pressure, AgileThinker (Ours), which engages both reactive and planning reasoning paradigms, consistently outperforms agents that engage only one of them. Scores are averaged across different games.

You can see our post, paper, website, and dataset for more detailed demonstrations and explanations.

⚡🧠🏋️ Realtime Reasoning Gym

Quick Start

Install Realtime Reasoning Gym:

git clone https://github.com/wenyl22/RealtimeGym.git
cd RealtimeGym

# Install in development mode
pip install -e .

# Install development dependencies
pip install -e ".[dev]"

Set up API keys in .env, a template is provided in .env.example.

The Real-Time Reasoning Challenge

In typical OpenAI Gym environments, agents have unlimited time to think:

# Traditional approach - unbounded thinking time
obs, done = env.reset()
while not done:
    action = agent.act(obs)  # Can take minutes!
    obs, reward, done, info = env.step(action)

RealtimeGym introduces explicit constraints on the agent's thinking time:

# Real-time approach - bounded thinking time
obs, done = env.reset()
while not done:
    agent.observe(obs)        # Fast observation
    agent.think(timeout=8192) # Bounded thinking (tokens or seconds)
    action = agent.act() or DEFAULT_ACTION # Default if a decision isn't made in time
    obs, done, reward, reset = env.step(action)

Real-time reasoning requires agents to balance correctness and timeliness.

Cognitive Load Control

Realtime Reasoning Gym includes three real-time games with increasing cognitive loads:

Game	Description	Actions	Cognitive Load Levels
Freeway	Cross busy roads avoiding cars	`U` (up), `D` (down), `S` (stay)	v0 (Easy), v1 (Medium), v2 (Hard)
Snake	Classic snake game with growing body	`U`, `D`, `L` (left), `R` (right), `S`	v0, v1, v2
Overcooked	Cooperative cooking simulation	`U`, `D`, `L`, `R`, `I` (interact), `S`	v0, v1, v2

Create any environment:

env, real_seed, renderer = realtimegym.make('Freeway-v2', seed=0, render=False)
# Note:
# seed != real_seed, because real_seed embeds the cognitive load level.
# renderer = None if render=False, otherwise call renderer.render(env) to visualize the game state.
# See examples/basic_renderer.py for details.

Time Pressure Control

Realtime Reasoning Gym supports two time constraint types:

Token Budget

agent.think(timeout=8192)  # Environment evolves after 8192 decoding steps

Best for: LLM-based agents
Measures: Token count from API calls
Advantage: Platform-independent, reproducible

Time Budget

agent.think(timeout=5.0)  # Environment evolves after 5 seconds

Best for: Real-time deployment scenarios
Measures: Wall-clock time
Advantage: Direct real-world applicability

Set via --time_unit token or --time_unit seconds.

Built-in LLM Agents

Real-time Reasoning Gym's built-in agents all implement the BaseAgent interface:

from .base import BaseAgent

class MyAgent(BaseAgent):
    def __init__(
        self,
        prompts: Any,  # prompt: dynamically loaded module, mapping game state to text
        file: str, # log file path
        time_unit: str, # 'token' or 'seconds'
        **kwargs,
    ) -> None:
        super().__init__(prompts, file, time_unit)
        # Initialize internal state here

    def think(self, timeout: int): -> None:
        """Process information and plan action within budget.

        Args:
            timeout: Token count (time_unit='token') or seconds (time_unit='seconds')
        """
        # Decision making within timeout budget
        # store chosen action here
        self.action = ...

RealtimeGym provides three ready-to-use agent types:

Agent	Strategy	Best For	Supported Models
ReactiveAgent	Generate responses in bounded time	Low cognitive load, high time pressure scenarios	All OpenAI-compatible
PlanningAgent	Comprehensive planning without time constraints	High cognitive load, low time pressure scenarios	All OpenAI-compatible
AgileThinker	Hybrid reactive + planning	Balanced performance	Models with explicit reasoning tokens

Evaluation

Evaluate agents using the command-line interface:

# Detailed configuration
agile_eval --time_unit token \
    --time_pressure 8192 \
    --internal_budget 4096 \
    --game freeway \
    --cognitive_load E \
    --mode agile \
    --reactive-model-config configs/deepseek-v3.2-reactive.yaml \
    --planning-model-config configs/deepseek-v3.2-planning.yaml \
    --seed_num 1 --repeat_times 1

# Using more compact configurations
agile_eval --time_unit token \
    --settings freeway_H_8192_agile_4096 \
    --reactive-model-config configs/deepseek-v3.2-reactive.yaml \
    --planning-model-config configs/deepseek-v3.2-planning.yaml \
    --seed_num 8 --repeat_times 1

Add a New Environment

To create a custom environment:

Create a new file in src/realtimegym/environments/ inheriting from BaseEnvironment:

from .base import BaseEnvironment

class MyGameEnv(BaseEnvironment):
     def __init__(self):
         super().__init__()
     def set_seed(self, seed: int):
         # Set game random seed
         # In our implementation, cognitive load level is embedded in seed
         pass
     def reset(self):
         # Initialize game state
         return observation, done
     def step(self, action):
         # Process action and update state
         return observation, done, reward, reset
     def state_string(self):
         # Return human-readable state in text
         return state_string
     def state_builder(self):
         # Return detailed state dict
         return state_dict

Register your environment in src/realtimegym/__init__.py:

(Optional) To evaluate built-in LLM agents, create prompts in src/realtimegym/prompts/mygame.py following existing patterns. Specifically, you need to implement a function:

def state_to_description(observation: dict, mode: str) -> str | dict:
    ## Return different descriptions based on agent mode
    if mode == "reactive":
        return text_description_for_reactive_agent
    elif mode == "planning":
        return text_description_for_planning_agent
    elif mode == "agile":
        return {
            "planning": text_description_for_planning_agent,
            "reactive": text_description_for_reactive_agent
        }

API Reference

Environment API

# Environment creation
env, seed, renderer = realtimegym.make(env_id, seed=0, render=False)

# Environment interaction
obs, done = env.reset()
obs, done, reward, reset = env.step(action)

Environment Observation Structure

{
    'state_string': str,    # Human-readable game state
    'game_turn': int,       # Current turn number
    'state': dict          # Detailed game-specific state
}

Agent Configuration

Agents accept YAML configuration files specifying model parameters:

model: "gpt-4o-mini"
api_key: "your-api-key"
inference_parameters:
    temperature: 0.7
    max_tokens: 1000

Testing

Run the comprehensive test suite:

# All tests
pytest

# Specific test categories
pytest tests/test_environments.py
pytest tests/test_agents.py

Examples

Explore the examples/ directory:

# Basic usage patterns
python examples/basic_usage.py

# Basic usage with rendering
python examples/basic_renderer.py

# Compare all environments
python examples/all_environments.py

# Custom agent implementation
python examples/custom_agent.py

# Cognitive load level analysis
python examples/difficulty_levels.py

Citation

If you use Realtime Reasoning Gym in your research, please cite our work:

@software{realtimegym2025,
  title={Realtime Reasoning Gym: A Real-Time Learning Environment for Language Agents},
  author={Yule Wen, Yixin Ye, Yanzhe Zhang, Diyi Yang and Hao Zhu},
  year={2025},
  url={https://github.com/wenyl22/RealtimeGym},
  note={A framework for evaluating language agents under real-time constraints}
}

Acknowledgements

RealtimeGym builds upon the excellent open-source project Overcooked. We thank the original authors for their contributions.

Links

🏠 Homepage: https://github.com/wenyl22/RealtimeGym
📖 Documentation: https://bleaves.github.io/real-time-reasoning/
🐛 Issues: https://github.com/wenyl22/RealtimeGym/issues
💬 Discussions: https://github.com/wenyl22/RealtimeGym/discussions

RealtimeGym - Advancing real-time reasoning capabilities in language agents ⚡

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
assets		assets
configs		configs
examples		examples
src/realtimegym		src/realtimegym
tests		tests
.coverage		.coverage
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
test_new_api.py		test_new_api.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

⚡🧠🏋️ Realtime Reasoning Gym

Quick Start

The Real-Time Reasoning Challenge

Cognitive Load Control

Time Pressure Control

Token Budget

Time Budget

Built-in LLM Agents

Evaluation

Add a New Environment

API Reference

Environment API

Environment Observation Structure

Agent Configuration

Testing

Examples

Citation

Acknowledgements

Links

About

Uh oh!

Releases

Packages

Languages

License

SALT-NLP/RealtimeGym

Folders and files

Latest commit

History

Repository files navigation

⚡🧠🏋️ Realtime Reasoning Gym

Quick Start

The Real-Time Reasoning Challenge

Cognitive Load Control

Time Pressure Control

Token Budget

Time Budget

Built-in LLM Agents

Evaluation

Add a New Environment

API Reference

Environment API

Environment Observation Structure

Agent Configuration

Testing

Examples

Citation

Acknowledgements

Links

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages