[examples] feat: add end-to-end SWE-bench RL training recipe (swe_agent) by aoshen02 · Pull Request #52 · verl-project/uni-agent

aoshen02 · 2026-05-29T08:46:03Z

What does this PR do?

Adds examples/swe_agent/ — an end-to-end recipe for training a
SWE-bench coding agent with fully-async RL (Megatron actors + vLLM
rollout on separate nodes) and Modal swe-rex sandboxes.

It stitches the existing building blocks into something runnable,
mirroring examples/search_agent/:

data: examples/data_preprocess/swe_rebench.py + swe_bench_verified.py
reward: uni_agent.reward.swe_rebench / swe_bench
rollout: uni_agent.agent_loop.UniAgentLoop (Modal swe-rex)

Reference config trains Qwen3-235B-A22B-Instruct-2507 with GRPO on a
12-node (8 train + 4 rollout) × 4-GPU topology; everything is
env-overridable to scale down.

Checklist Before Starting

Search for similar PRs/issues:
- gh pr list --repo verl-project/uni-agent --state open → no SWE-bench training example (PR init commit for external agent framework+gateway #25 is an unrelated agent-framework/gateway)
- no existing examples/swe* dir
Format the PR title as [examples] feat: ...

Test

This is a recipe (scripts + configs + docs), not library code:

bash -n train_qwen3_235b_swebench.sh — OK
python -c "import yaml; yaml.safe_load(...)" on both YAMLs — OK
pre-commit run --files examples/swe_agent/* — pass (compile-all; ruff/mypy skip non-py)
shellcheck — clean except style-only SC2206 on the hydra arg-array append, consistent with the repo's other launch scripts

Full end-to-end training was run internally on the reference topology;
the committed files are the scrubbed/generalized form of that setup
(no secrets or site-specific paths — runtime_env.yaml ships
placeholders only).

Files

File	Purpose
`train_qwen3_235b_swebench.sh`	`ray job submit` + full GRPO / Megatron / vLLM config; topology & paths are env vars
`agent_config.yaml`	UniAgentLoop config: tools, Modal deployment, rollout concurrency, reward
`runtime_env.yaml`	Ray runtime-env template (placeholders for Modal / W&B tokens + checkout paths)
`README.md`	dataset → runtime_env → launch → monitor + tuning notes

Notes captured for reproducibility

Non-obvious settings learned running this at scale (documented in the
script header / README):

max_response_length=128K — SWE-bench trajectories are long (mean ~70K tokens, ~90 turns); 32K truncates ~half
tool_parser: hermes for Qwen3-235B (wrong parser silently breaks tool calls)
moe_token_dispatcher_type=alltoall — portable MoE dispatch
VLLM_USE_DEEP_GEMM=0 — vLLM 0.21 EP/CUTLASS init workaround
do not set expandable_segments:True (incompatible with vLLM sleep-mode CuMemAllocator, pytorch#147851)

Checklist Before Submitting

Read the Contribute Guide
pre-commit run --files examples/swe_agent/* passed
No new library code → no unit tests; recipe validated via syntax/lint + internal end-to-end run
AI assistance was used (Claude Code); the submitting human (@aoshen02) reviewed every line
No secrets / site-specific paths committed

@aoshen02

Adds examples/swe_agent/, an end-to-end recipe for training a SWE-bench coding agent with fully-async RL (Megatron actors + vLLM rollout) and Modal swe-rex sandboxes. It stitches together the existing building blocks (examples/data_preprocess/swe_rebench.py + swe_bench_verified.py for data, uni_agent.reward.swe_rebench / swe_bench for reward, uni_agent.agent_loop.UniAgentLoop for rollout) into a runnable launch script + configs + README, mirroring the structure of examples/search_agent/. Reference config trains Qwen3-235B-A22B-Instruct-2507 with GRPO on a 12-node (8 train + 4 rollout) x 4-GPU topology, but everything is env-overridable to scale down. Files: - train_qwen3_235b_swebench.sh : ray job submit + full GRPO / Megatron / vLLM config. Topology, paths, and addresses are env vars. - agent_config.yaml : UniAgentLoop config (tools, Modal deployment, rollout concurrency, reward). - runtime_env.yaml : Ray runtime-env template (placeholders for Modal / W&B tokens and checkout paths). - README.md : dataset -> runtime_env -> launch -> monitor, plus tuning notes. The script header / README capture a few non-obvious settings learned from running this at scale: max_response_length=128K (SWE-bench trajectories are long), tool_parser=hermes for Qwen3-235B, moe_token_dispatcher_type=alltoall, VLLM_USE_DEEP_GEMM=0 for vLLM 0.21 EP init, and the expandable_segments / CuMemAllocator incompatibility. No secrets or environment-specific paths are committed; runtime_env.yaml ships placeholders only. This PR includes AI assistance (Claude Code). The submitting human (@aoshen02) reviewed every line. Signed-off-by: aoshen02 <aoshen@inferact.ai> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces an end-to-end recipe for training a SWE-bench coding agent using the Uni-Agent framework with fully-async RL and Modal sandboxes. It includes a README, agent loop configuration, Ray runtime environment template, and a launch script for training Qwen3-235B-A22B-Instruct-2507. The review feedback suggests correcting the default dataset paths in the launch script to include the _modal suffix, and double-quoting several variables and arguments (such as paths, addresses, and parameters containing square brackets) to prevent word splitting and globbing issues in Bash.

gemini-code-assist · 2026-05-29T08:48:34Z

+DATA_ROOT=${DATA_ROOT:-/path/to/data-root}
+EXAMPLE_DIR=${EXAMPLE_DIR:-$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)}
+# Qwen3-235B-A22B-Instruct-2507 snapshot (first/only snapshot under the HF cache).
+MODEL_PATH=${MODEL_PATH:-$(ls -d ${DATA_ROOT}/hf-models/hub/models--Qwen--Qwen3-235B-A22B-Instruct-2507/snapshots/*/ 2>/dev/null | head -1)}


Double-quote ${DATA_ROOT} inside the command substitution to prevent word splitting if the path contains spaces.

Suggested change

MODEL_PATH=${MODEL_PATH:-$(ls -d ${DATA_ROOT}/hf-models/hub/models--Qwen--Qwen3-235B-A22B-Instruct-2507/snapshots/*/ 2>/dev/null | head -1)}

MODEL_PATH=${MODEL_PATH:-$(ls -d "${DATA_ROOT}"/hf-models/hub/models--Qwen--Qwen3-235B-A22B-Instruct-2507/snapshots/*/ 2>/dev/null | head -1)}

gemini-code-assist · 2026-05-29T08:48:34Z

+TRAIN_FILE=${TRAIN_FILE:-${DATA_ROOT}/data/swe_agent/swe_rebench_filtered.parquet}
+TEST_FILE=${TEST_FILE:-${DATA_ROOT}/data/swe_agent/swe_bench_verified.parquet}


The preprocessing scripts swe_rebench.py and swe_bench_verified.py append the deployment implementation suffix (e.g., _modal.parquet) to the output filenames. Since this training script is configured for Modal, the default paths should include the _modal suffix to avoid file-not-found errors out of the box.

Suggested change

TRAIN_FILE=${TRAIN_FILE:-${DATA_ROOT}/data/swe_agent/swe_rebench_filtered.parquet}

TEST_FILE=${TEST_FILE:-${DATA_ROOT}/data/swe_agent/swe_bench_verified.parquet}

TRAIN_FILE=${TRAIN_FILE:-${DATA_ROOT}/data/swe_agent/swe_rebench_filtered_modal.parquet}

TEST_FILE=${TEST_FILE:-${DATA_ROOT}/data/swe_agent/swe_bench_verified_modal.parquet}

gemini-code-assist · 2026-05-29T08:48:34Z

+  actor_rollout_ref.model.mtp.enable_rollout=False
+)
+
+CHECKPOINT_CONTENTS=['model','hf_model','extra']


The CHECKPOINT_CONTENTS variable contains square brackets [ and ] which are globbing characters in bash. It should be quoted to prevent accidental shell expansion.

Suggested change

CHECKPOINT_CONTENTS=['model','hf_model','extra']

CHECKPOINT_CONTENTS="['model','hf_model','extra']"

gemini-code-assist · 2026-05-29T08:48:34Z

+
+CHECKPOINT_CONTENTS=['model','hf_model','extra']
+
+ray job submit --no-wait --address=$RAY_ADDRESS --runtime-env $RUNTIME_ENV \


Double-quote $RAY_ADDRESS and $RUNTIME_ENV to prevent word splitting and globbing if the paths or addresses contain spaces or special characters.

Suggested change

ray job submit --no-wait --address=$RAY_ADDRESS --runtime-env $RUNTIME_ENV \

ray job submit --no-wait --address="$RAY_ADDRESS" --runtime-env "$RUNTIME_ENV" \

gemini-code-assist · 2026-05-29T08:48:34Z

+    -- python3 -m verl.experimental.fully_async_policy.fully_async_main \
+    --config-path=config \
+    --config-name='fully_async_ppo_megatron_trainer.yaml' \
+    hydra.searchpath=[pkg://verl.trainer.config] \


The hydra.searchpath argument contains square brackets [ and ] which are globbing characters in bash. It should be double-quoted to prevent accidental pathname expansion.

Suggested change

hydra.searchpath=[pkg://verl.trainer.config] \

hydra.searchpath="[pkg://verl.trainer.config]" \

gemini-code-assist · 2026-05-29T08:48:34Z

+    actor_rollout_ref.actor.ppo_max_token_len_per_gpu=${actor_ppo_max_token_len} \
+    actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=${infer_ppo_max_token_len} \
+    actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu=${infer_ppo_max_token_len} \
+    actor_rollout_ref.model.path=${MODEL_PATH} \


Double-quote ${MODEL_PATH} to prevent word splitting if the model path contains spaces or special characters.

Suggested change

actor_rollout_ref.model.path=${MODEL_PATH} \

actor_rollout_ref.model.path="${MODEL_PATH}" \

gemini-code-assist · 2026-05-29T08:48:34Z

+    actor_rollout_ref.actor.optim.clip_grad=1.0 \
+    actor_rollout_ref.actor.loss_agg_mode=${loss_agg_mode} \
+    actor_rollout_ref.actor.checkpoint.async_save=False \
+    actor_rollout_ref.actor.checkpoint.save_contents=${CHECKPOINT_CONTENTS} \


The ${CHECKPOINT_CONTENTS} variable should be double-quoted to prevent globbing and word splitting by the shell.

Suggested change

actor_rollout_ref.actor.checkpoint.save_contents=${CHECKPOINT_CONTENTS} \

actor_rollout_ref.actor.checkpoint.save_contents="${CHECKPOINT_CONTENTS}" \

gemini-code-assist · 2026-05-29T08:48:35Z

+    actor_rollout_ref.rollout.multi_turn.enable=True \
+    actor_rollout_ref.rollout.multi_turn.max_parallel_calls=1 \
+    actor_rollout_ref.rollout.agent.num_workers=8 \
+    actor_rollout_ref.rollout.agent.agent_loop_config_path=${AGENT_CONFIG_PATH} \


Double-quote ${AGENT_CONFIG_PATH} to prevent word splitting if the path contains spaces or special characters.

Suggested change

actor_rollout_ref.rollout.agent.agent_loop_config_path=${AGENT_CONFIG_PATH} \

actor_rollout_ref.rollout.agent.agent_loop_config_path="${AGENT_CONFIG_PATH}" \

gemini-code-assist · 2026-05-29T08:48:35Z

+    algorithm.rollout_correction.rollout_is=${rollout_is} \
+    algorithm.rollout_correction.rollout_rs=${rollout_rs} \
+    algorithm.rollout_correction.rollout_rs_threshold=${rollout_rs_threshold} \
+    trainer.logger=['console','wandb'] \


The trainer.logger argument contains square brackets [ and ] which are globbing characters in bash. It should be double-quoted to prevent accidental pathname expansion if files matching the pattern exist in the working directory.

Suggested change

trainer.logger=['console','wandb'] \

trainer.logger="['console','wandb']" \

gemini-code-assist Bot reviewed May 29, 2026

View reviewed changes

rename folder

31b8d53

yyDing1 merged commit bdf8ae8 into verl-project:main Jun 1, 2026
3 checks passed

yyDing1 deleted the examples/swe-agent-recipe branch June 1, 2026 09:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[examples] feat: add end-to-end SWE-bench RL training recipe (swe_agent)#52

[examples] feat: add end-to-end SWE-bench RL training recipe (swe_agent)#52
yyDing1 merged 2 commits into
verl-project:mainfrom
aoshen02:examples/swe-agent-recipe

aoshen02 commented May 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	MODEL_PATH=${MODEL_PATH:-$(ls -d ${DATA_ROOT}/hf-models/hub/models--Qwen--Qwen3-235B-A22B-Instruct-2507/snapshots/*/ 2>/dev/null \| head -1)}
	MODEL_PATH=${MODEL_PATH:-$(ls -d "${DATA_ROOT}"/hf-models/hub/models--Qwen--Qwen3-235B-A22B-Instruct-2507/snapshots/*/ 2>/dev/null \| head -1)}

		TRAIN_FILE=${TRAIN_FILE:-${DATA_ROOT}/data/swe_agent/swe_rebench_filtered.parquet}
		TEST_FILE=${TEST_FILE:-${DATA_ROOT}/data/swe_agent/swe_bench_verified.parquet}

	CHECKPOINT_CONTENTS=['model','hf_model','extra']
	CHECKPOINT_CONTENTS="['model','hf_model','extra']"


		CHECKPOINT_CONTENTS=['model','hf_model','extra']

		ray job submit --no-wait --address=$RAY_ADDRESS --runtime-env $RUNTIME_ENV \

	ray job submit --no-wait --address=$RAY_ADDRESS --runtime-env $RUNTIME_ENV \
	ray job submit --no-wait --address="$RAY_ADDRESS" --runtime-env "$RUNTIME_ENV" \

	hydra.searchpath=[pkg://verl.trainer.config] \
	hydra.searchpath="[pkg://verl.trainer.config]" \

	actor_rollout_ref.model.path=${MODEL_PATH} \
	actor_rollout_ref.model.path="${MODEL_PATH}" \

	actor_rollout_ref.actor.checkpoint.save_contents=${CHECKPOINT_CONTENTS} \
	actor_rollout_ref.actor.checkpoint.save_contents="${CHECKPOINT_CONTENTS}" \

	actor_rollout_ref.rollout.agent.agent_loop_config_path=${AGENT_CONFIG_PATH} \
	actor_rollout_ref.rollout.agent.agent_loop_config_path="${AGENT_CONFIG_PATH}" \

	trainer.logger=['console','wandb'] \
	trainer.logger="['console','wandb']" \

Conversation

aoshen02 commented May 29, 2026

What does this PR do?

Checklist Before Starting

Test

Files

Notes captured for reproducibility

Checklist Before Submitting

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants