[worker, vllm] feat: expose nemo gym token id patch by cmunley1 · Pull Request #5833 · verl-project/verl

cmunley1 · 2026-03-31T19:20:01Z

What does this PR do?

Allows NeMo Gym to patch vLLM's chat serving layer to preserve token IDs across multi-turn rollouts, preventing retokenization mismatches during RL training. May be replaced by (#5790) in future. Tied to verl-project/verl-recipe#80. Minimal change with no impact on non-NeMo Gym workloads.

Exposes apply_nemo_gym_server_patch() on vLLMHttpServer, called by NemoGymAgentLoopManager at startup via ray remote to patch OpenAIServingChat._preprocess_chat and OpenAIServingTokenization._preprocess_chat at runtime inside the vLLM server Ray actor process. This seems needed because it executes in the server process and the patch cant go across process boundary. Patch implements the same logic as NeMo RL. Tested on vLLM 0.17.0 (verlai/verl:vllm017.latest).

Signed-off-by: cmunley1 <cmunley@nvidia.com>

CLAassistant · 2026-03-31T19:20:15Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

gemini-code-assist

Code Review

This pull request introduces NVIDIA NeMo Gym integration, adding a custom agent loop manager, a specialized JSONL dataset loader, and a vLLM server patch for multi-turn RL support. The review feedback highlights several critical improvements for robustness: using Ray's utility for reliable node IP resolution in distributed clusters, handling cases where agent references might be strings to prevent TypeErrors, and refining the response budget logic to safely handle scenarios where prompt length approaches or exceeds the maximum model length.

gemini-code-assist · 2026-03-31T19:25:23Z

verl/experimental/nemo_gym/agent_loop.py

+            initial_global_cfg.setdefault("uv_venv_dir", str(nemo_gym_root))
+            initial_global_cfg.setdefault("skip_venv_if_present", True)
+
+        node_ip = socket.gethostbyname(socket.gethostname())


Using socket.gethostbyname(socket.gethostname()) is unreliable for determining the node's IP address in distributed environments, as it may return the loopback address or an internal IP that is not reachable by other nodes. It is recommended to use ray.util.get_node_ip_address() which is more robust in Ray-based clusters and consistent with other parts of the codebase.

Suggested change

node_ip = socket.gethostbyname(socket.gethostname())

node_ip = ray.util.get_node_ip_address()

gemini-code-assist · 2026-03-31T19:25:24Z

verl/experimental/nemo_gym/agent_loop.py

+                result = _postprocess_nemo_gym_result(nemo_gym_result, self._tokenizer)
+            except ValueError:
+                result = _empty_result(nemo_gym_row, self._tokenizer)
+            result["env"] = nemo_gym_row["agent_ref"]["name"]


This line assumes that agent_ref is always a dictionary containing a "name" key. However, in NeMo Gym, agent_ref can sometimes be a string (e.g., a direct URL). Accessing ["name"] on a string will raise a TypeError. Consider adding a check or a fallback.

Suggested change

result["env"] = nemo_gym_row["agent_ref"]["name"]

result["env"] = nemo_gym_row["agent_ref"].get("name", "unknown") if isinstance(nemo_gym_row["agent_ref"], dict) else str(nemo_gym_row["agent_ref"])

gemini-code-assist · 2026-03-31T19:25:24Z

verl/experimental/nemo_gym/agent_loop.py

+        response_budget = (int(max_model_len) - prompt_length) if max_model_len else None
+        response_length = max(response_lens) if response_lens else self.rollout_config.response_length
+        if response_budget:
+            response_length = min(response_length, response_budget)


If prompt_length exceeds or equals max_model_len, response_budget will be non-positive. The current logic if response_budget: will evaluate to False when the budget is 0, skipping the cap and potentially leading to a crash or OOM. Additionally, a negative budget would cause tokenizer.pad to fail. The budget should be explicitly checked against None and clamped to a minimum of 0.

Suggested change

response_budget = (int(max_model_len) - prompt_length) if max_model_len else None

response_length = max(response_lens) if response_lens else self.rollout_config.response_length

if response_budget:

response_length = min(response_length, response_budget)

response_budget = (int(max_model_len) - prompt_length) if max_model_len is not None else None

response_length = max(response_lens) if response_lens else self.rollout_config.response_length

if response_budget is not None:

response_length = max(0, min(response_length, response_budget))

wuxibin89 · 2026-04-01T02:36:58Z

nemo gym agents and environments work through HTTP OpenAI Responses API. Multi-turn token IDs are re-tokenized each turn via chat template.

We're working on Agent Gateway #5790 to make each agent framework seamlessly integrated into verl for agentic rl training.

For now, you can submit this PR to https://github.com/verl-project/verl-recipe

Signed-off-by: cmunley1 <cmunley@nvidia.com>

cmunley1 · 2026-04-06T05:25:56Z

Moved here verl-project/verl-recipe#80, I think the small patch only called by nemo gym is still needed here at the moment, AgentGateway may change things.

Thanks!

cmunley1 added 16 commits March 27, 2026 20:16

nemo gym integration

738fc75

Signed-off-by: cmunley1 <cmunley@nvidia.com>

remove path stuff

896bd91

Signed-off-by: cmunley1 <cmunley@nvidia.com>

remove toolparser, make nemogym folder

f265516

Signed-off-by: cmunley1 <cmunley@nvidia.com>

dont except omegaconf

47fc776

Signed-off-by: cmunley1 <cmunley@nvidia.com>

updates

0f85db8

Signed-off-by: cmunley1 <cmunley@nvidia.com>

docs

7b2a147

Signed-off-by: cmunley1 <cmunley@nvidia.com>

docs

af92136

Signed-off-by: cmunley1 <cmunley@nvidia.com>

docs

53ab670

Signed-off-by: cmunley1 <cmunley@nvidia.com>

updates

591d912

Signed-off-by: cmunley1 <cmunley@nvidia.com>

updates

1538034

Signed-off-by: cmunley1 <cmunley@nvidia.com>

data path

b7a0a59

Signed-off-by: cmunley1 <cmunley@nvidia.com>

patch vllm server for nemogym retokenization fix

902e831

Signed-off-by: cmunley1 <cmunley@nvidia.com>

serverside patch and stuff

1bb9889

Signed-off-by: cmunley1 <cmunley@nvidia.com>

docs

613295b

Signed-off-by: cmunley1 <cmunley@nvidia.com>

lint and log single env metrics too

a68f7a3

Signed-off-by: cmunley1 <cmunley@nvidia.com>

updates

7fe5498

Signed-off-by: cmunley1 <cmunley@nvidia.com>

cmunley1 requested review from ArronHZG, PeterSH6, chenhaiq, eric-haibin-lin and wuxibin89 as code owners March 31, 2026 19:20

gemini-code-assist bot reviewed Mar 31, 2026

View reviewed changes

wuxibin89 requested a review from ISEEKYAN April 1, 2026 02:17

wuxibin89 mentioned this pull request Apr 1, 2026

[RFC] Agent Abstractions and Trajectory Gateway for VERL #5790

Open

cmunley1 added 3 commits April 4, 2026 02:04

move to recipe

37f83c4

Signed-off-by: cmunley1 <cmunley@nvidia.com>

revert import

3a3d536

Signed-off-by: cmunley1 <cmunley@nvidia.com>

update patch import to recipe

f3880c9

Signed-off-by: cmunley1 <cmunley@nvidia.com>

cmunley1 mentioned this pull request Apr 6, 2026

feat: nemo gym recipe verl-project/verl-recipe#80

Open

6 tasks

cmunley1 changed the title ~~feat: NVIDIA NeMo Gym Integration~~ feat: nemo gym vllm support Apr 6, 2026

cmunley1 changed the title ~~feat: nemo gym vllm support~~ [worker, vllm] feat: nemo gym vllm server patch hook Apr 7, 2026

cmunley1 changed the title ~~[worker, vllm] feat: nemo gym vllm server patch hook~~ [worker, vllm] feat: nemo gym token ids support Apr 7, 2026

cmunley1 changed the title ~~[worker, vllm] feat: nemo gym token ids support~~ [worker, vllm] feat: expose nemo gym token id patch Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[worker, vllm] feat: expose nemo gym token id patch#5833

[worker, vllm] feat: expose nemo gym token id patch#5833
cmunley1 wants to merge 19 commits intoverl-project:mainfrom
cmunley1:cmunley1/nemo-gym-int-dev-retok

cmunley1 commented Mar 31, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Mar 31, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

wuxibin89 commented Apr 1, 2026 •

edited

Loading

Uh oh!

cmunley1 commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	node_ip = socket.gethostbyname(socket.gethostname())
	node_ip = ray.util.get_node_ip_address()

	result["env"] = nemo_gym_row["agent_ref"]["name"]
	result["env"] = nemo_gym_row["agent_ref"].get("name", "unknown") if isinstance(nemo_gym_row["agent_ref"], dict) else str(nemo_gym_row["agent_ref"])

Conversation

cmunley1 commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

CLAassistant commented Mar 31, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

wuxibin89 commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmunley1 commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cmunley1 commented Mar 31, 2026 •

edited

Loading

wuxibin89 commented Apr 1, 2026 •

edited

Loading