init commit for external agent framework+gateway#25
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive agent framework and gateway system designed to facilitate agentic workflows within a training environment. Key components include a factory for constructing frameworks, an OpenAI-compatible framework implementation that manages sequence generation and trajectory logging, and a gateway system that provides an OpenAI-compatible API for agent interactions. The gateway handles session lifecycle, trajectory buffering, and multimodal data processing. Feedback identifies a critical issue with the incremental token encoding logic in the gateway, which may produce malformed sequences due to assumptions about tokenizer stability and turn separators. Further recommendations include parallelizing reward calculations to improve performance and replacing blocking ray.get calls with asynchronous operations to avoid event loop starvation.
| def _encode_incremental( | ||
| self, | ||
| messages: list[dict[str, Any]], | ||
| image_data: list[Any] | None = None, | ||
| video_data: list[Any] | None = None, | ||
| ) -> list[int]: | ||
| """Encode incremental messages (tool results, user follow-ups) for a continuation turn. | ||
|
|
||
| Uses the remove_system_prompt pattern from ToolAgentLoop: encode the new messages | ||
| alone (which prepends a system prompt), then strip the known system_prompt prefix. | ||
| No tools parameter — tool schema is already in the initial prompt_ids. | ||
| """ | ||
| if self._processor is not None: | ||
| raw_prompt = _apply_chat_template( | ||
| self._processor, | ||
| messages, | ||
| add_generation_prompt=True, | ||
| tokenize=False, | ||
| **self._apply_chat_template_kwargs, | ||
| ) | ||
| videos = video_data | ||
| video_metadata = None | ||
| if videos is not None: | ||
| videos, video_metadata = zip(*videos, strict=False) | ||
| videos, video_metadata = list(videos), list(video_metadata) | ||
| model_inputs = self._processor( | ||
| text=[raw_prompt], | ||
| images=image_data, | ||
| videos=videos, | ||
| video_metadata=video_metadata, | ||
| return_tensors="pt", | ||
| do_sample_frames=False, | ||
| ) | ||
| ids = normalize_token_ids(model_inputs["input_ids"]) | ||
| else: | ||
| ids = normalize_token_ids( | ||
| _apply_chat_template( | ||
| self._tokenizer, messages, add_generation_prompt=True, | ||
| **self._apply_chat_template_kwargs, | ||
| ) | ||
| ) | ||
| return ids[len(self._system_prompt):] |
There was a problem hiding this comment.
The incremental encoding logic is fragile and likely to produce malformed token sequences. Slicing tokens based on the length of a pre-encoded system prompt assumes that the tokenizer is prefix-stable and that the chat template doesn't insert turn separators or special tokens between the system prompt and the first message. Furthermore, concatenating these incremental IDs to the previous turn's response IDs (at line 542) will miss the necessary turn separators (e.g., <|im_end|> and <|im_start|>user) required by most chat templates. It is safer to re-encode the full message history and identify the delta, or simply rely on the backend's prefix caching by sending the full prompt.
| gateway_actor_kwargs["backend"] = self | ||
|
|
||
| self.owned_gateway_actors = [GatewayActor.remote(**gateway_actor_kwargs) for _ in range(gateway_count)] | ||
| ray.get([gateway.start.remote() for gateway in self.owned_gateway_actors]) |
There was a problem hiding this comment.
Using ray.get inside an async context (called via build_agent_framework) will block the event loop, preventing other concurrent tasks from making progress. Since a helper _await_ray_ref is already defined in this file, you should consider moving the gateway startup logic to an async initialization method that can be awaited, rather than performing blocking calls in the constructor.
|
为什么放在trainer目录下?我觉得这是黑盒调用训推通用的流程。我偏向往上提一级,直接放uni_agent/framework和uni_agent/gateway. |
bee0d08 to
ef46265
Compare
| return DataProto(batch=batch, non_tensor_batch=non_tensor_batch) | ||
|
|
||
|
|
||
| class OpenAICompatibleAgentFramework(AgentFramework): |
There was a problem hiding this comment.
Move OpenAICompatibleAgentFramework into a separate file, keep abstract interface only.
There was a problem hiding this comment.
moved the abstract interface to base.py
2da5be1 to
825b7f3
Compare
825b7f3 to
9c7c97a
Compare
a7e392b to
9677db1
Compare
|
The current entry point binds a single runner via We may introduce an AgentRunner abstract base with a minimal run() contract: Each sample carries the runner name; config mounts a name → runner map. The config could be in the following format: Then the framework resolves the runner per-session by sample["agent_runner_name"], like: This could be similar to verl's existing |
|
Hi, I would like to propose using Prefix Trie for multi-trajectory storage for Agentgateway. My RFC is here:#51
For detailed explanation, please also refer to this comment: verl-project/verl#6299 (comment) |
…nt) (#52) ### What does this PR do? Adds `examples/swe_agent/` — an end-to-end recipe for training a SWE-bench coding agent with fully-async RL (Megatron actors + vLLM rollout on separate nodes) and Modal swe-rex sandboxes. It stitches the existing building blocks into something runnable, mirroring `examples/search_agent/`: - data: `examples/data_preprocess/swe_rebench.py` + `swe_bench_verified.py` - reward: `uni_agent.reward.swe_rebench` / `swe_bench` - rollout: `uni_agent.agent_loop.UniAgentLoop` (Modal swe-rex) Reference config trains Qwen3-235B-A22B-Instruct-2507 with GRPO on a 12-node (8 train + 4 rollout) × 4-GPU topology; everything is env-overridable to scale down. ### Checklist Before Starting - [x] Search for similar PRs/issues: - `gh pr list --repo verl-project/uni-agent --state open` → no SWE-bench training example (PR #25 is an unrelated agent-framework/gateway) - no existing `examples/swe*` dir - [x] Format the PR title as `[examples] feat: ...` ### Test This is a recipe (scripts + configs + docs), not library code: - `bash -n train_qwen3_235b_swebench.sh` — OK - `python -c "import yaml; yaml.safe_load(...)"` on both YAMLs — OK - `pre-commit run --files examples/swe_agent/*` — pass (compile-all; ruff/mypy skip non-py) - `shellcheck` — clean except style-only SC2206 on the hydra arg-array append, consistent with the repo's other launch scripts Full end-to-end training was run internally on the reference topology; the committed files are the scrubbed/generalized form of that setup (no secrets or site-specific paths — `runtime_env.yaml` ships placeholders only). ### Files | File | Purpose | |---|---| | `train_qwen3_235b_swebench.sh` | `ray job submit` + full GRPO / Megatron / vLLM config; topology & paths are env vars | | `agent_config.yaml` | UniAgentLoop config: tools, Modal deployment, rollout concurrency, reward | | `runtime_env.yaml` | Ray runtime-env **template** (placeholders for Modal / W&B tokens + checkout paths) | | `README.md` | dataset → runtime_env → launch → monitor + tuning notes | ### Notes captured for reproducibility Non-obvious settings learned running this at scale (documented in the script header / README): - `max_response_length=128K` — SWE-bench trajectories are long (mean ~70K tokens, ~90 turns); 32K truncates ~half - `tool_parser: hermes` for Qwen3-235B (wrong parser silently breaks tool calls) - `moe_token_dispatcher_type=alltoall` — portable MoE dispatch - `VLLM_USE_DEEP_GEMM=0` — vLLM 0.21 EP/CUTLASS init workaround - do **not** set `expandable_segments:True` (incompatible with vLLM sleep-mode CuMemAllocator, pytorch#147851) ### Checklist Before Submitting - [x] Read the Contribute Guide - [x] `pre-commit run --files examples/swe_agent/*` passed - [x] No new library code → no unit tests; recipe validated via syntax/lint + internal end-to-end run - [x] AI assistance was used (Claude Code); the submitting human (@aoshen02) reviewed every line - [x] No secrets / site-specific paths committed --------- Signed-off-by: aoshen02 <aoshen@inferact.ai> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: yuyangding <yuyangding@bytedance.com>
|
Hi, I noticed that, in the original verl Suggestion: Refer the original verl AgentLoopManager pattern — introduce multiple AgentLoopWorker Ray actors, partition the batch tasks across them. # refer AgentLoopManager._init_agent_loop_workers()
for i in range(num_workers):
worker = AgentLoopWorker.options(
scheduling_strategy=NodeAffinitySchedulingStrategy(node_id=..., soft=True)
).remote(config, ...)
self.workers.append(worker)
async def generate_sequences(self, prompts):
chunks = prompts.chunk(len(self.workers))
await asyncio.gather(*[
w.generate_sequences.remote(chunk)
for w, chunk in zip(self.workers, chunks)
])This separates two independent concurrency axes: gateway_count for LLM serving throughput, num_workers for agent execution parallelism — consistent with the original verl design. |
|
| from verl.workers.rollout.utils import run_uvicorn | ||
|
|
||
|
|
||
| class _GatewayActor: |
There was a problem hiding this comment.
Why not just decorate it with @ray.remote?
@ray.remote
class GatewayActor:
...Split from PR verl-project#25 per maintainer request: gateway is the first independently-reviewable PR. Owns SessionHandle/Trajectory (moved from framework.types). No framework dependency. Spec: cxb_dev/docs/plans/2026-06-03-pr25-split-gateway-framework-deepeyes-design.md Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
P0 follow-up to PR verl-project#25 review: docstring every public class, method, field, and function in the gateway package. Pure documentation; zero behavior change. Full regression 50 passed unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
a300294 to
a17e1ed
Compare
Introduce uni_agent/trainer/gateway/protocol.py with OpenAI-compatible ChatCompletionRequest / ChatCompletionResponse TypedDicts. _handle_chat_completions now annotates its payload as ChatCompletionRequest and constructs the response via ChatCompletionResponse local instead of an anonymous dict. Response gains the OpenAI-standard `created` (unix ts) and `model` fields; `model` falls back to "unknown" when the request omits it to avoid breaking direct-call test payloads. MessageCodec runtime validation, GatewaySession envelope, GenerationOutcome contract, Trajectory token-truth all unchanged. No pydantic, no openai SDK runtime dependency. Spec: cxb_dev/docs/plans/2026-06-04-gateway-openai-sdk-typed-io-design.md Addresses PR verl-project#25 wuxibin89 review: typed request/response. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
a17e1ed to
d0ad4af
Compare
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
d0ad4af to
348f520
Compare
Thanks for the comment, this is a valid point. The current framework path fans out all Since I have split the current PR into three separate PRs (gateway, framework, and examples), I will treat this as a framework-layer follow-up and address it in the next PR instead of this one. |
What does this PR do?
This PR adds a trainer-side agent framework and gateway runtime for multi-turn agent-style rollout in uni-agent, as a downstream integration of verl RFC #5790 and the upstream agent framework PR verl#6299.
Specifically, it:
uni_agent.trainer.framework—AgentFrameworkabstract base,OpenAICompatibleAgentFrameworkconcrete implementation, andAgentFrameworkRolloutAdapter(satisfies the trainer'sagent_loop_manager_classextension point; recipes wire it in via YAML with no per-recipe glue),uni_agent.trainer.gateway—_GatewayActor/GatewayManager/GatewayServingRuntimefor OpenAI-compatible session serving, sticky session routing, tool-parser wiring, and multimodal media accumulation; backend routing delegates toLLMServerClient,examples/deepeyes/,Wave 2 additions (length budget enforcement + OpenAI parity):
prompt_length/response_lengthbudget injected into_GatewayActor; continuation turns clamped to remaining budget; budget-exhausted turns materialise a syntheticfinish_reason=lengthresponse without hitting the backend.{"error": {"message": …, "type": …, "code": …}}); encode/decode failures caught and surfaced as 400.chat_template_kwargsforwarded toapply_chat_template;reasoning_contentpreserved through_normalize_messageand prefix comparison.n>1,response_format,tool_choice=required/function) rejected with 400;tool_choice="none"supported (skips tool injection and parser).3c5f6e04(verl PR #6129: moveLLMServerManagerout ofAgentLoopManager) so reviewers cangit submodule update --initwithout access to a private fork.Checklist Before Starting
https://github.com/verl-project/uni-agent/pulls?q=is%3Apr+gateway+framework
[{modules}] {type}: {description}[trainer] feat: add agent framework and gateway runtimeTest
PYTHONPATH=$(pwd) pytest tests/uni_agent/trainer/ -qResult: 64 passed, 6 warnings (framework, gateway, runtime, multimodal postprocess).
Real-rollout evidence from the deepeyes gateway recipe: a 50-step GRPO run on multi-turn multimodal data (Qwen3.5-4B, 7× RTX 3090 train + 1× local judge) produced a real learning curve —
critic/rewards/meanmoved from ~0.21 at step 1 to ~1.86 by step 50.API and Usage Example
Public APIs added:
uni_agent.trainer.framework—AgentFramework,OpenAICompatibleAgentFramework,AgentFrameworkRolloutAdapter,build_agent_frameworkuni_agent.trainer.gateway—GatewayServingRuntime,GatewayManager,GatewayActorMinimum viable wiring via YAML config:
The adapter calls
build_agent_framework()which wiresGatewayServingRuntimeand the framework subclass from config. The agent runner only needs the gateway base URL:generate_sequences()writes finalized trajectories directly to TransferQueue with key"{uid}_{session_id}_{index}", matchingAgentLoopWorkerTQ._agent_loop_postprocess()'s field / tag layout.Design & Code Changes
High-level changes:
AgentFrameworkbase class +OpenAICompatibleAgentFrameworkown session orchestration (create_session→agent_runner→finalize_session), trajectory assembly, multimodal post-processing, reward scoring, and TransferQueue writes. Per-session failures are isolated viaasyncio.gather(..., return_exceptions=True)so one bad session does not cancel the rest of the batch._GatewayActorprovides OpenAI Chat Completions over sticky sessions with prefix-consistency checks, tool-parser decoding, multimodal media accumulation, and rollout budget enforcement.GatewayManagerroutes new sessions by least-active count.GatewayServingRuntimeowns gateway actor lifecycle and delegates backend routing toLLMServerClient.multi_modal_inputsand(4, seq_len)position ids inside the framework, so VLM sessions do not need per-recipe glue.AgentFrameworkRolloutAdaptersatisfies the trainer'sagent_loop_manager_classcontract; every recipe wires the same class in YAML — no per-recipe adapter code.WIP / Follow-up
GatewayActordefault placement strategy (at least one per node) once multi-node validation is inChecklist Before Submitting
*_on_cpu.pynaming convention.pre-commit install && pre-commit run --all-files