feat: enable v2 training pipeline with controller parity#1327
Draft
garrett4wade wants to merge 2 commits into
Draft
feat: enable v2 training pipeline with controller parity#1327garrett4wade wants to merge 2 commits into
garrett4wade wants to merge 2 commits into
Conversation
Bring GatewayTrainController and RolloutControllerV2 to full parity with v1 controllers for RL training paths. Key changes: - Route to RolloutControllerV2 when config._version=="v2" - Add version management, connect_engine, clear_batches to GatewayTrainController - Simplify AsyncRewardWrapper lifecycle with atexit shutdown - Unify HTTP client sessions across inference/training controllers - Switch default workflow to MathAgent in example configs - Add agent config section to all example YAML files - Remove obsolete get_custom_reward_fn from reward module - Add async reward wrapper tests
Partial groups produce inconsistent training data. Reject the entire group if any _run_one call raises, instead of silently returning the successful subset with 0.0 rewards for failures.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Bring GatewayTrainController and RolloutControllerV2 to full parity with v1 controllers, enabling the v2 training pipeline for RL training paths.
Type of Change
Key Changes
V2 Controller Parity
sglang_remoteandvllm_remotetoRolloutControllerV2whenconfig._version == "v2"connect_engineguard address support, andclear_batchesRTensor storage eviction toGatewayTrainControllerconfig_perf_tracercalls to individual workers instead of gateway relaystaleness_managertoWorkflowExecutorinRolloutControllerV2AsyncRewardWrapper Lifecycle
weakreffinalization + instance counting withatexitshutdown for all shared executorsAsyncRewardWrapperinstances in math agent workflows instead of creating per-callHTTP Client Unification
create_httpx_clientconsistently inworkflow_context.pysock_connect/connecttimeouts to aiohttp sessionsExample Configs
agent:section (mode: inline,export_style: individual,turn_discount: 1.0) to all example YAML configsRLVRWorkflowtoMathAgentingsm8k_rl.pymax_tokensto generation configCleanup
get_custom_reward_fnandVALID_REWARD_FNfromareal/reward/__init__.pyRisk Areas
get_custom_reward_fnremoved from reward public API — callers using this function will need to import reward functions directlymain)Checklist
pre-commit run --all-files)tests/test_async_reward_wrapper.py)main/create-prTest Commands
Skipped suites: GPU/distributed tests (
tests/grpo/,tests/torchrun/) — require multi-GPU hardware not available locally.