[data] feat: add support for swe bench multilingual by yyDing1 · Pull Request #55 · verl-project/uni-agent

yyDing1 · 2026-06-07T07:11:34Z

Add SWE-bench Multilingual support

Adds end-to-end support for the SWE-bench/SWE-bench_Multilingual dataset (300 instances, 7 languages: c/go/java/js/php/ruby/rust), graded via the official swebench harness.

Changes

Reward spec: new swe_bench_multilingual reward spec + registration in registry.py / reward/__init__.py.
Data preprocessing: examples/data_preprocess/swe_bench_multilingual.py — emits the SWE-agent format using the public swebench/sweb.eval.* images, and caps each Modal sandbox at 4 cores / 8 GiB via modal_sandbox_kwargs.
Modal deployment (deployment.py):
- pass DEBIAN_FRONTEND=noninteractive to apt_install so non-Python eval images don't hang on a tzdata prompt during image build;
- coerce cpu/memory list→tuple, since a parquet round-trip turns the (request, limit) tuples into lists that Modal would otherwise silently ignore;
- raise default max-starting-per-worker 8→64.
swe_bench.py: reset only the modified test files to base_commit instead of git checkout master/main, fixing repos whose default branch isn't master/main.
Verify script (parallel_verify_swe.py): tqdm progress + summary, per-instance filtering, and per-case deployment config merged over defaults.
Docs: add a SWE-bench Multilingual results table.

Test plan

DEPLOYMENT=modal python examples/data_preprocess/swe_bench_multilingual.py produces a 300-row parquet.
Gold-patch verification via parallel_verify_swe.py resolves the dataset; remaining failures are upstream data/environment issues (e.g. unpinned Rust deps, network-dependent gradle bootstrap), not harness/spec bugs.

gemini-code-assist

Code Review

This pull request introduces support for the SWE-bench Multilingual dataset, including a preprocessing script, a new reward specification, and registration of the multilingual reward. It also refactors the parallel verification script to support progress streaming, improves test file resetting in the standard SWE-bench reward, and tunes Modal deployment configurations. The review feedback highlights critical issues: class-level instantiation of 'asyncio.Semaphore' in the Ray actor will raise a 'RuntimeError' in Python 3.11+, using check='ignore' during patch application silently masks failures, and direct dictionary access on 'MAP_REPO_TO_EXT' risks a 'KeyError'.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-07T07:13:09Z

+        last_error: Exception | None = None
+        for cmd in commands:
+            try:
+                await self.env.communicate(cmd, check="ignore")


Using check="ignore" here causes self.env.communicate to silently succeed even if the patch application command fails (returns a non-zero exit code). As a result, the loop will immediately log "Applied patch successfully!" and return, without ever trying the fallback patch application strategies or raising an error if all strategies fail.\n\nChange check="ignore" to check="raise" so that failed patch attempts raise a RuntimeError and trigger the fallback commands.

Suggested change

await self.env.communicate(cmd, check="ignore")

await self.env.communicate(cmd, check="raise")

gemini-code-assist · 2026-06-07T07:13:10Z

+    _semaphore = asyncio.Semaphore(max(1, GLOBAL_CONCURRENCY // NUM_WORKERS))

    async def run_single(self, sample):
        async with self._semaphore:
            return await run_sample(sample)


Instantiating asyncio.Semaphore at the class level (outside of an active event loop) will raise a RuntimeError: no running event loop in Python 3.11+ during module import or class definition. Additionally, because Ray actors run in their own event loops, class-level semaphores can be bound to the wrong event loop, leading to RuntimeError: Task got Future attached to a different loop.\n\nTo fix this, lazily initialize the semaphore inside the actor's methods or inside the actor's initialization context.

def __init__(self):\n self._semaphore = None\n\n async def run_single(self, sample):\n if self._semaphore is None:\n self._semaphore = asyncio.Semaphore(max(1, GLOBAL_CONCURRENCY // NUM_WORKERS))\n async with self._semaphore:\n return await run_sample(sample)

yyDing1 added 4 commits June 5, 2026 15:44

update

a8f8a08

update

772a982

update

d363197

update

25fedad

gemini-code-assist Bot reviewed Jun 7, 2026

View reviewed changes

updayte

e743727

yyDing1 merged commit 01d6c43 into main Jun 7, 2026
4 checks passed

yyDing1 deleted the swe-bench-multilingual branch June 7, 2026 07:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[data] feat: add support for swe bench multilingual#55

[data] feat: add support for swe bench multilingual#55
yyDing1 merged 5 commits into
mainfrom
swe-bench-multilingual

yyDing1 commented Jun 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 7, 2026

Uh oh!

gemini-code-assist Bot Jun 7, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	await self.env.communicate(cmd, check="ignore")
	await self.env.communicate(cmd, check="raise")

Conversation

yyDing1 commented Jun 7, 2026

Add SWE-bench Multilingual support

Changes

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant