Skip to content

feat: nemo gym recipe#80

Open
cmunley1 wants to merge 5 commits intoverl-project:mainfrom
cmunley1:cmunley1/nemo-gym
Open

feat: nemo gym recipe#80
cmunley1 wants to merge 5 commits intoverl-project:mainfrom
cmunley1:cmunley1/nemo-gym

Conversation

@cmunley1
Copy link
Copy Markdown

@cmunley1 cmunley1 commented Apr 6, 2026

What does this PR do?

Integrates NVIDIA NeMo Gym RL Environments, enabling multi-env and custom agent rl on top of verl's Megatron vLLM path.

  • recipe/nemo_gym/agent_loop.py: NemoGymAgentLoopManager overrides generate_sequences() to offload rollout collection to nemo gym's RolloutCollectionHelper, includign full multi turn loops, tools, agent harnesses, environment logic, and reward functions. Formats results back into verl's DataProto.
  • recipe/nemo_gym/dataset.py: NemoGymJSONLDataset loads NeMo Gym JSONL format into verl's data pipeline.
  • recipe/nemo_gym/server_patch.py: patches vLLM's OpenAIServingChat and OpenAIServingTokenization to correct token ID mismatches from chat template retokenization in multi-turn rollouts, matching NeMo RL's approach. Requires [worker, vllm] feat: expose nemo gym token id patch verl#5833 to expose the patch entrypoint on the vLLM server.
  • recipe/nemo_gym/configs/: per-environment config YAMLs specifying nemo_gym_root and config_paths. Passed via agent_loop_config_path.
  • recipe/nemo_gym/submit_{math,workplace,multienv}.sh: example Slurm submission scripts.

nemo gym agents and environments work through HTTP OpenAI Responses API. Multi-turn token IDs are re-tokenized each turn via chat template. Thus, verl vllm server is patched only for nemo gym path to handle retokenization error correction. Requires verl-project/verl#5833 to apply the patch which is defined here. Agent Gateway may remove the need for this patch. Tool, user, and environment role masking (mask=0 for non assistant, mask=1 for assistant tokens) is applied in postprocessing.

Tested on 9 NeMo Gym Environments including multi-turn and multi-environment, for example:

image ### Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

cmunley1 added 5 commits April 4, 2026 17:53
Signed-off-by: cmunley1 <[email protected]>
Signed-off-by: cmunley1 <[email protected]>
Signed-off-by: cmunley1 <[email protected]>
Signed-off-by: cmunley1 <[email protected]>
Signed-off-by: cmunley1 <[email protected]>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the NVIDIA NeMo Gym integration for verl, enabling scalable multi-environment RL training. Key additions include the NemoGymAgentLoopManager for rollout collection, a JSONL dataset loader, and vLLM server patches to ensure retokenization consistency during multi-step rollouts. The review feedback identifies several architectural and stability concerns: fragile dependencies on vLLM's internal data structures, tight coupling with verl's private methods, and potential resource exhaustion issues related to high aiohttp limits and memory-intensive dataset loading. Additionally, the reviewer suggests using public Ray APIs and more robust port-binding logic to prevent race conditions.

Comment on lines +105 to +116
# tested on vLLM 0.17.0. other versions may error
template_prefix_ids = prefix_res[1][0]["prompt_token_ids"]

tok = self.renderer.get_tokenizer()
engine_prompt = res[1][0]
engine_prompt["prompt_token_ids"] = _replace_prefix_tokens(
required_prefix,
template_prefix_ids,
engine_prompt["prompt_token_ids"],
tok,
)
return res
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The dependency on the specific index res[1][0] and prefix_res[1][0] from vLLM's internal _preprocess_chat output is extremely fragile. This structure is not part of vLLM's public API and is subject to change between versions, which will break this integration. This creates a high maintenance burden for future vLLM updates.

# _postprocess is an AgentLoopWorker method that stacks _InternalAgentLoopOutput
# into a DataProto batch. It accesses self.distillation_enabled and
# self.reward_loop_worker_handles, both of which we set in _init_nemo_gym.
_postprocess = _AgentLoopWorker._postprocess
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Directly accessing and re-binding the private method _AgentLoopWorker._postprocess violates encapsulation and creates a tight coupling with the internal implementation of AgentLoopWorker. This makes the code prone to breaking if verl internals are refactored.

Comment on lines +140 to +141
initial_global_cfg.setdefault("global_aiohttp_connector_limit_per_host", 16_384)
initial_global_cfg.setdefault("global_aiohttp_connector_limit", 65_536)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The configured aiohttp connector limits (16k per host, 64k total) are very high and may exceed the system's open file descriptor limit (ulimit -n). This could lead to OSError: [Errno 24] Too many open files under heavy load. Consider setting these to more conservative values or ensuring the execution environment is configured with a sufficiently high ulimit.

initial_global_cfg.setdefault("uv_venv_dir", str(nemo_gym_root))
initial_global_cfg.setdefault("skip_venv_if_present", True)

node_ip = ray._private.services.get_node_ip_address()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using ray._private.services.get_node_ip_address() is risky as private APIs are not stable and can change or be removed in any Ray update. Use the public ray.util.get_node_ip_address() instead.

Suggested change
node_ip = ray._private.services.get_node_ip_address()
node_ip = ray.util.get_node_ip_address()

Comment on lines +151 to +154
with socket.socket() as s:
s.bind(("", 0))
head_port = s.getsockname()[1]
initial_global_cfg[HEAD_SERVER_KEY_NAME] = {"host": "0.0.0.0", "port": head_port}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Closing the socket s before the RunHelper actually binds to head_port creates a race condition where another process on the system could claim the port in the interval. It is more robust to let the underlying server bind to port 0 directly and then query the assigned port from the running instance, if the NeMo Gym API supports it.

Comment on lines +40 to +44
with open(path) as f:
for line in f:
line = line.strip()
if line:
self._rows.append(json.loads(line))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Loading the entire JSONL dataset into memory at once will cause Out-Of-Memory (OOM) issues when dealing with large-scale RL datasets. Implementing a lazy-loading mechanism that reads lines on demand or indexes file offsets would improve scalability.

@cmunley1 cmunley1 changed the title nemo gym integration feat: nemo gym recipe Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant