feat: enable v2 training pipeline with controller parity by garrett4wade · Pull Request #1327 · areal-project/AReaL

garrett4wade · 2026-05-11T07:16:55Z

Description

Bring GatewayTrainController and RolloutControllerV2 to full parity with v1 controllers, enabling the v2 training pipeline for RL training paths.

Type of Change

Key Changes

V2 Controller Parity

Route sglang_remote and vllm_remote to RolloutControllerV2 when config._version == "v2"
Add thread-safe version management, connect_engine guard address support, and clear_batches RTensor storage eviction to GatewayTrainController
Direct config_perf_tracer calls to individual workers instead of gateway relay
Pass staleness_manager to WorkflowExecutor in RolloutControllerV2

AsyncRewardWrapper Lifecycle

Replace weakref finalization + instance counting with atexit shutdown for all shared executors
Simplify retry logic and executor recreation with compare-and-swap guard
Reuse AsyncRewardWrapper instances in math agent workflows instead of creating per-call

HTTP Client Unification

Use create_httpx_client consistently in workflow_context.py
Add sock_connect/connect timeouts to aiohttp sessions
Unify HTTP client session usage across inference/training controllers

Example Configs

Add agent: section (mode: inline, export_style: individual, turn_discount: 1.0) to all example YAML configs
Switch default workflow from RLVRWorkflow to MathAgent in gsm8k_rl.py
Add max_tokens to generation config

Cleanup

Remove obsolete get_custom_reward_fn and VALID_REWARD_FN from areal/reward/__init__.py
Remove gateway HTTP helper tests superseded by unified client

Risk Areas

Breaking: get_custom_reward_fn removed from reward public API — callers using this function will need to import reward functions directly
Force push: Branch history rewritten (rebase + squash onto latest main)

Checklist

Pre-commit hooks pass (pre-commit run --all-files)
New tests added (tests/test_async_reward_wrapper.py)
Branch is up to date with main
This PR was created by a coding agent via /create-pr

Test Commands

uv run pytest tests/test_async_reward_wrapper.py
uv run pytest tests/experimental/inference_service/test_controller_version.py
uv run pytest tests/test_examples.py

Skipped suites: GPU/distributed tests (tests/grpo/, tests/torchrun/) — require multi-GPU hardware not available locally.

Bring GatewayTrainController and RolloutControllerV2 to full parity with v1 controllers for RL training paths. Key changes: - Route to RolloutControllerV2 when config._version=="v2" - Add version management, connect_engine, clear_batches to GatewayTrainController - Simplify AsyncRewardWrapper lifecycle with atexit shutdown - Unify HTTP client sessions across inference/training controllers - Switch default workflow to MathAgent in example configs - Add agent config section to all example YAML files - Remove obsolete get_custom_reward_fn from reward module - Add async reward wrapper tests

Partial groups produce inconsistent training data. Reject the entire group if any _run_one call raises, instead of silently returning the successful subset with 0.0 rewards for failures.

garrett4wade requested review from CormickKneey, HwVanICI, PrometheusComing, TaoZex, fishcrap, geshi001, guozhihao-224, nuzant, rchardx and sitabulaixizawaluduo as code owners May 11, 2026 07:16

garrett4wade marked this pull request as draft May 11, 2026 07:17

fix(archon): abandon rollout group when any session fails

8a20593

Partial groups produce inconsistent training data. Reject the entire group if any _run_one call raises, instead of silently returning the successful subset with 0.0 rewards for failures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enable v2 training pipeline with controller parity#1327

feat: enable v2 training pipeline with controller parity#1327
garrett4wade wants to merge 2 commits into
mainfrom
fw/rl3

garrett4wade commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garrett4wade commented May 11, 2026

Description

Type of Change

Key Changes

Risk Areas

Checklist

Test Commands

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant