feat: nemo gym recipe by cmunley1 · Pull Request #80 · verl-project/verl-recipe

cmunley1 · 2026-04-06T05:25:05Z

What does this PR do?

Integrates NVIDIA NeMo Gym RL Environments, enabling multi-env and custom agent rl on top of verl's Megatron vLLM path.

recipe/nemo_gym/agent_loop.py: NemoGymAgentLoopManager overrides generate_sequences() to offload rollout collection to nemo gym's RolloutCollectionHelper, includign full multi turn loops, tools, agent harnesses, environment logic, and reward functions. Formats results back into verl's DataProto.
recipe/nemo_gym/dataset.py: NemoGymJSONLDataset loads NeMo Gym JSONL format into verl's data pipeline.
recipe/nemo_gym/server_patch.py: patches vLLM's OpenAIServingChat and OpenAIServingTokenization to correct token ID mismatches from chat template retokenization in multi-turn rollouts, matching NeMo RL's approach. Requires [worker, vllm] feat: expose nemo gym token id patch verl#5833 to expose the patch entrypoint on the vLLM server.
recipe/nemo_gym/configs/: per-environment config YAMLs specifying nemo_gym_root and config_paths. Passed via agent_loop_config_path.
recipe/nemo_gym/submit_{math,workplace,multienv}.sh: example Slurm submission scripts.

nemo gym agents and environments work through HTTP OpenAI Responses API. Multi-turn token IDs are re-tokenized each turn via chat template. Thus, verl vllm server is patched only for nemo gym path to handle retokenization error correction. Requires verl-project/verl#5833 to apply the patch which is defined here. Agent Gateway may remove the need for this patch. Tool, user, and environment role masking (mask=0 for non assistant, mask=1 for assistant tokens) is applied in postprocessing.

Tested on 9 NeMo Gym Environments including multi-turn and multi-environment, for example:

### Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)
If your PR is related to the recipe submodule, please also update the reference to the submodule commit via git submodule update --remote or cd recipe && git pull origin main.

Signed-off-by: cmunley1 <[email protected]>

gemini-code-assist

Code Review

This pull request introduces the NVIDIA NeMo Gym integration for verl, enabling scalable multi-environment RL training. Key additions include the NemoGymAgentLoopManager for rollout collection, a JSONL dataset loader, and vLLM server patches to ensure retokenization consistency during multi-step rollouts. The review feedback identifies several architectural and stability concerns: fragile dependencies on vLLM's internal data structures, tight coupling with verl's private methods, and potential resource exhaustion issues related to high aiohttp limits and memory-intensive dataset loading. Additionally, the reviewer suggests using public Ray APIs and more robust port-binding logic to prevent race conditions.

gemini-code-assist · 2026-04-06T05:30:02Z

nemo_gym/server_patch.py

+        # tested on vLLM 0.17.0. other versions may error
+        template_prefix_ids = prefix_res[1][0]["prompt_token_ids"]
+
+        tok = self.renderer.get_tokenizer()
+        engine_prompt = res[1][0]
+        engine_prompt["prompt_token_ids"] = _replace_prefix_tokens(
+            required_prefix,
+            template_prefix_ids,
+            engine_prompt["prompt_token_ids"],
+            tok,
+        )
+        return res


The dependency on the specific index res[1][0] and prefix_res[1][0] from vLLM's internal _preprocess_chat output is extremely fragile. This structure is not part of vLLM's public API and is subject to change between versions, which will break this integration. This creates a high maintenance burden for future vLLM updates.

gemini-code-assist · 2026-04-06T05:30:02Z

nemo_gym/agent_loop.py

+# _postprocess is an AgentLoopWorker method that stacks _InternalAgentLoopOutput
+# into a DataProto batch. It accesses self.distillation_enabled and
+# self.reward_loop_worker_handles, both of which we set in _init_nemo_gym.
+_postprocess = _AgentLoopWorker._postprocess


Directly accessing and re-binding the private method _AgentLoopWorker._postprocess violates encapsulation and creates a tight coupling with the internal implementation of AgentLoopWorker. This makes the code prone to breaking if verl internals are refactored.

gemini-code-assist · 2026-04-06T05:30:02Z

nemo_gym/agent_loop.py

+        initial_global_cfg.setdefault("global_aiohttp_connector_limit_per_host", 16_384)
+        initial_global_cfg.setdefault("global_aiohttp_connector_limit", 65_536)


The configured aiohttp connector limits (16k per host, 64k total) are very high and may exceed the system's open file descriptor limit (ulimit -n). This could lead to OSError: [Errno 24] Too many open files under heavy load. Consider setting these to more conservative values or ensuring the execution environment is configured with a sufficiently high ulimit.

gemini-code-assist · 2026-04-06T05:30:02Z

nemo_gym/agent_loop.py

+            initial_global_cfg.setdefault("uv_venv_dir", str(nemo_gym_root))
+            initial_global_cfg.setdefault("skip_venv_if_present", True)
+
+        node_ip = ray._private.services.get_node_ip_address()


Using ray._private.services.get_node_ip_address() is risky as private APIs are not stable and can change or be removed in any Ray update. Use the public ray.util.get_node_ip_address() instead.

Suggested change

node_ip = ray._private.services.get_node_ip_address()

node_ip = ray.util.get_node_ip_address()

gemini-code-assist · 2026-04-06T05:30:02Z

nemo_gym/agent_loop.py

+        with socket.socket() as s:
+            s.bind(("", 0))
+            head_port = s.getsockname()[1]
+        initial_global_cfg[HEAD_SERVER_KEY_NAME] = {"host": "0.0.0.0", "port": head_port}


Closing the socket s before the RunHelper actually binds to head_port creates a race condition where another process on the system could claim the port in the interval. It is more robust to let the underlying server bind to port 0 directly and then query the assigned port from the running instance, if the NeMo Gym API supports it.

gemini-code-assist · 2026-04-06T05:30:02Z

nemo_gym/dataset.py

+            with open(path) as f:
+                for line in f:
+                    line = line.strip()
+                    if line:
+                        self._rows.append(json.loads(line))


Loading the entire JSONL dataset into memory at once will cause Out-Of-Memory (OOM) issues when dealing with large-scale RL datasets. Implementing a lazy-loading mechanism that reads lines on demand or indexes file offsets would improve scalability.

cmunley1 added 5 commits April 4, 2026 17:53

init

76cd8c1

Signed-off-by: cmunley1 <[email protected]>

update

f8ddcc1

Signed-off-by: cmunley1 <[email protected]>

readme

f5171dc

Signed-off-by: cmunley1 <[email protected]>

doc

27f9f45

Signed-off-by: cmunley1 <[email protected]>

update cfg

6b71a0b

Signed-off-by: cmunley1 <[email protected]>

cmunley1 mentioned this pull request Apr 6, 2026

[worker, vllm] feat: expose nemo gym token id patch verl-project/verl#5833

Open

gemini-code-assist bot reviewed Apr 6, 2026

View reviewed changes

cmunley1 changed the title ~~nemo gym integration~~ feat: nemo gym recipe Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: nemo gym recipe#80

feat: nemo gym recipe#80
cmunley1 wants to merge 5 commits intoverl-project:mainfrom
cmunley1:cmunley1/nemo-gym

cmunley1 commented Apr 6, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		initial_global_cfg.setdefault("global_aiohttp_connector_limit_per_host", 16_384)
		initial_global_cfg.setdefault("global_aiohttp_connector_limit", 65_536)

	node_ip = ray._private.services.get_node_ip_address()
	node_ip = ray.util.get_node_ip_address()

Conversation

cmunley1 commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cmunley1 commented Apr 6, 2026 •

edited

Loading