Skip to content

[worker, vllm] feat: expose nemo gym token id patch#5833

Open
cmunley1 wants to merge 19 commits intoverl-project:mainfrom
cmunley1:cmunley1/nemo-gym-int-dev-retok
Open

[worker, vllm] feat: expose nemo gym token id patch#5833
cmunley1 wants to merge 19 commits intoverl-project:mainfrom
cmunley1:cmunley1/nemo-gym-int-dev-retok

Conversation

@cmunley1
Copy link
Copy Markdown

@cmunley1 cmunley1 commented Mar 31, 2026

What does this PR do?

Allows NeMo Gym to patch vLLM's chat serving layer to preserve token IDs across multi-turn rollouts, preventing retokenization mismatches during RL training. May be replaced by (#5790) in future. Tied to verl-project/verl-recipe#80. Minimal change with no impact on non-NeMo Gym workloads.

Exposes apply_nemo_gym_server_patch() on vLLMHttpServer, called by NemoGymAgentLoopManager at startup via ray remote to patch OpenAIServingChat._preprocess_chat and OpenAIServingTokenization._preprocess_chat at runtime inside the vLLM server Ray actor process. This seems needed because it executes in the server process and the patch cant go across process boundary. Patch implements the same logic as NeMo RL. Tested on vLLM 0.17.0 (verlai/verl:vllm017.latest).

cmunley1 added 16 commits March 27, 2026 20:16
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces NVIDIA NeMo Gym integration, adding a custom agent loop manager, a specialized JSONL dataset loader, and a vLLM server patch for multi-turn RL support. The review feedback highlights several critical improvements for robustness: using Ray's utility for reliable node IP resolution in distributed clusters, handling cases where agent references might be strings to prevent TypeErrors, and refining the response budget logic to safely handle scenarios where prompt length approaches or exceeds the maximum model length.

initial_global_cfg.setdefault("uv_venv_dir", str(nemo_gym_root))
initial_global_cfg.setdefault("skip_venv_if_present", True)

node_ip = socket.gethostbyname(socket.gethostname())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using socket.gethostbyname(socket.gethostname()) is unreliable for determining the node's IP address in distributed environments, as it may return the loopback address or an internal IP that is not reachable by other nodes. It is recommended to use ray.util.get_node_ip_address() which is more robust in Ray-based clusters and consistent with other parts of the codebase.

Suggested change
node_ip = socket.gethostbyname(socket.gethostname())
node_ip = ray.util.get_node_ip_address()

result = _postprocess_nemo_gym_result(nemo_gym_result, self._tokenizer)
except ValueError:
result = _empty_result(nemo_gym_row, self._tokenizer)
result["env"] = nemo_gym_row["agent_ref"]["name"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This line assumes that agent_ref is always a dictionary containing a "name" key. However, in NeMo Gym, agent_ref can sometimes be a string (e.g., a direct URL). Accessing ["name"] on a string will raise a TypeError. Consider adding a check or a fallback.

Suggested change
result["env"] = nemo_gym_row["agent_ref"]["name"]
result["env"] = nemo_gym_row["agent_ref"].get("name", "unknown") if isinstance(nemo_gym_row["agent_ref"], dict) else str(nemo_gym_row["agent_ref"])

Comment on lines +203 to +206
response_budget = (int(max_model_len) - prompt_length) if max_model_len else None
response_length = max(response_lens) if response_lens else self.rollout_config.response_length
if response_budget:
response_length = min(response_length, response_budget)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If prompt_length exceeds or equals max_model_len, response_budget will be non-positive. The current logic if response_budget: will evaluate to False when the budget is 0, skipping the cap and potentially leading to a crash or OOM. Additionally, a negative budget would cause tokenizer.pad to fail. The budget should be explicitly checked against None and clamped to a minimum of 0.

Suggested change
response_budget = (int(max_model_len) - prompt_length) if max_model_len else None
response_length = max(response_lens) if response_lens else self.rollout_config.response_length
if response_budget:
response_length = min(response_length, response_budget)
response_budget = (int(max_model_len) - prompt_length) if max_model_len is not None else None
response_length = max(response_lens) if response_lens else self.rollout_config.response_length
if response_budget is not None:
response_length = max(0, min(response_length, response_budget))

@wuxibin89 wuxibin89 requested a review from ISEEKYAN April 1, 2026 02:17
@wuxibin89
Copy link
Copy Markdown
Collaborator

wuxibin89 commented Apr 1, 2026

nemo gym agents and environments work through HTTP OpenAI Responses API. Multi-turn token IDs are re-tokenized each turn via chat template.

We're working on Agent Gateway #5790 to make each agent framework seamlessly integrated into verl for agentic rl training.

For now, you can submit this PR to https://github.com/verl-project/verl-recipe

cmunley1 added 3 commits April 4, 2026 02:04
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
@cmunley1
Copy link
Copy Markdown
Author

cmunley1 commented Apr 6, 2026

Moved here verl-project/verl-recipe#80, I think the small patch only called by nemo gym is still needed here at the moment, AgentGateway may change things.

Thanks!

@cmunley1 cmunley1 changed the title feat: NVIDIA NeMo Gym Integration feat: nemo gym vllm support Apr 6, 2026
@cmunley1 cmunley1 changed the title feat: nemo gym vllm support [worker, vllm] feat: nemo gym vllm server patch hook Apr 7, 2026
@cmunley1 cmunley1 changed the title [worker, vllm] feat: nemo gym vllm server patch hook [worker, vllm] feat: nemo gym token ids support Apr 7, 2026
@cmunley1 cmunley1 changed the title [worker, vllm] feat: nemo gym token ids support [worker, vllm] feat: expose nemo gym token id patch Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants