[worker, vllm] feat: expose nemo gym token id patch#5833
[worker, vllm] feat: expose nemo gym token id patch#5833cmunley1 wants to merge 19 commits intoverl-project:mainfrom
Conversation
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
|
|
There was a problem hiding this comment.
Code Review
This pull request introduces NVIDIA NeMo Gym integration, adding a custom agent loop manager, a specialized JSONL dataset loader, and a vLLM server patch for multi-turn RL support. The review feedback highlights several critical improvements for robustness: using Ray's utility for reliable node IP resolution in distributed clusters, handling cases where agent references might be strings to prevent TypeErrors, and refining the response budget logic to safely handle scenarios where prompt length approaches or exceeds the maximum model length.
| initial_global_cfg.setdefault("uv_venv_dir", str(nemo_gym_root)) | ||
| initial_global_cfg.setdefault("skip_venv_if_present", True) | ||
|
|
||
| node_ip = socket.gethostbyname(socket.gethostname()) |
There was a problem hiding this comment.
Using socket.gethostbyname(socket.gethostname()) is unreliable for determining the node's IP address in distributed environments, as it may return the loopback address or an internal IP that is not reachable by other nodes. It is recommended to use ray.util.get_node_ip_address() which is more robust in Ray-based clusters and consistent with other parts of the codebase.
| node_ip = socket.gethostbyname(socket.gethostname()) | |
| node_ip = ray.util.get_node_ip_address() |
| result = _postprocess_nemo_gym_result(nemo_gym_result, self._tokenizer) | ||
| except ValueError: | ||
| result = _empty_result(nemo_gym_row, self._tokenizer) | ||
| result["env"] = nemo_gym_row["agent_ref"]["name"] |
There was a problem hiding this comment.
This line assumes that agent_ref is always a dictionary containing a "name" key. However, in NeMo Gym, agent_ref can sometimes be a string (e.g., a direct URL). Accessing ["name"] on a string will raise a TypeError. Consider adding a check or a fallback.
| result["env"] = nemo_gym_row["agent_ref"]["name"] | |
| result["env"] = nemo_gym_row["agent_ref"].get("name", "unknown") if isinstance(nemo_gym_row["agent_ref"], dict) else str(nemo_gym_row["agent_ref"]) |
| response_budget = (int(max_model_len) - prompt_length) if max_model_len else None | ||
| response_length = max(response_lens) if response_lens else self.rollout_config.response_length | ||
| if response_budget: | ||
| response_length = min(response_length, response_budget) |
There was a problem hiding this comment.
If prompt_length exceeds or equals max_model_len, response_budget will be non-positive. The current logic if response_budget: will evaluate to False when the budget is 0, skipping the cap and potentially leading to a crash or OOM. Additionally, a negative budget would cause tokenizer.pad to fail. The budget should be explicitly checked against None and clamped to a minimum of 0.
| response_budget = (int(max_model_len) - prompt_length) if max_model_len else None | |
| response_length = max(response_lens) if response_lens else self.rollout_config.response_length | |
| if response_budget: | |
| response_length = min(response_length, response_budget) | |
| response_budget = (int(max_model_len) - prompt_length) if max_model_len is not None else None | |
| response_length = max(response_lens) if response_lens else self.rollout_config.response_length | |
| if response_budget is not None: | |
| response_length = max(0, min(response_length, response_budget)) |
We're working on Agent Gateway #5790 to make each agent framework seamlessly integrated into verl for agentic rl training. For now, you can submit this PR to https://github.com/verl-project/verl-recipe |
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
|
Moved here verl-project/verl-recipe#80, I think the small patch only called by nemo gym is still needed here at the moment, AgentGateway may change things. Thanks! |
What does this PR do?
Allows NeMo Gym to patch vLLM's chat serving layer to preserve token IDs across multi-turn rollouts, preventing retokenization mismatches during RL training. May be replaced by (#5790) in future. Tied to verl-project/verl-recipe#80. Minimal change with no impact on non-NeMo Gym workloads.
Exposes
apply_nemo_gym_server_patch()onvLLMHttpServer, called by NemoGymAgentLoopManager at startup via ray remote to patchOpenAIServingChat._preprocess_chatandOpenAIServingTokenization._preprocess_chatat runtime inside the vLLM server Ray actor process. This seems needed because it executes in the server process and the patch cant go across process boundary. Patch implements the same logic as NeMo RL. Tested on vLLM 0.17.0 (verlai/verl:vllm017.latest).