Conversation
Signed-off-by: cmunley1 <[email protected]>
Signed-off-by: cmunley1 <[email protected]>
Signed-off-by: cmunley1 <[email protected]>
Signed-off-by: cmunley1 <[email protected]>
Signed-off-by: cmunley1 <[email protected]>
There was a problem hiding this comment.
Code Review
This pull request introduces the NVIDIA NeMo Gym integration for verl, enabling scalable multi-environment RL training. Key additions include the NemoGymAgentLoopManager for rollout collection, a JSONL dataset loader, and vLLM server patches to ensure retokenization consistency during multi-step rollouts. The review feedback identifies several architectural and stability concerns: fragile dependencies on vLLM's internal data structures, tight coupling with verl's private methods, and potential resource exhaustion issues related to high aiohttp limits and memory-intensive dataset loading. Additionally, the reviewer suggests using public Ray APIs and more robust port-binding logic to prevent race conditions.
| # tested on vLLM 0.17.0. other versions may error | ||
| template_prefix_ids = prefix_res[1][0]["prompt_token_ids"] | ||
|
|
||
| tok = self.renderer.get_tokenizer() | ||
| engine_prompt = res[1][0] | ||
| engine_prompt["prompt_token_ids"] = _replace_prefix_tokens( | ||
| required_prefix, | ||
| template_prefix_ids, | ||
| engine_prompt["prompt_token_ids"], | ||
| tok, | ||
| ) | ||
| return res |
There was a problem hiding this comment.
The dependency on the specific index res[1][0] and prefix_res[1][0] from vLLM's internal _preprocess_chat output is extremely fragile. This structure is not part of vLLM's public API and is subject to change between versions, which will break this integration. This creates a high maintenance burden for future vLLM updates.
| # _postprocess is an AgentLoopWorker method that stacks _InternalAgentLoopOutput | ||
| # into a DataProto batch. It accesses self.distillation_enabled and | ||
| # self.reward_loop_worker_handles, both of which we set in _init_nemo_gym. | ||
| _postprocess = _AgentLoopWorker._postprocess |
There was a problem hiding this comment.
| initial_global_cfg.setdefault("global_aiohttp_connector_limit_per_host", 16_384) | ||
| initial_global_cfg.setdefault("global_aiohttp_connector_limit", 65_536) |
There was a problem hiding this comment.
The configured aiohttp connector limits (16k per host, 64k total) are very high and may exceed the system's open file descriptor limit (ulimit -n). This could lead to OSError: [Errno 24] Too many open files under heavy load. Consider setting these to more conservative values or ensuring the execution environment is configured with a sufficiently high ulimit.
| initial_global_cfg.setdefault("uv_venv_dir", str(nemo_gym_root)) | ||
| initial_global_cfg.setdefault("skip_venv_if_present", True) | ||
|
|
||
| node_ip = ray._private.services.get_node_ip_address() |
There was a problem hiding this comment.
Using ray._private.services.get_node_ip_address() is risky as private APIs are not stable and can change or be removed in any Ray update. Use the public ray.util.get_node_ip_address() instead.
| node_ip = ray._private.services.get_node_ip_address() | |
| node_ip = ray.util.get_node_ip_address() |
| with socket.socket() as s: | ||
| s.bind(("", 0)) | ||
| head_port = s.getsockname()[1] | ||
| initial_global_cfg[HEAD_SERVER_KEY_NAME] = {"host": "0.0.0.0", "port": head_port} |
There was a problem hiding this comment.
Closing the socket s before the RunHelper actually binds to head_port creates a race condition where another process on the system could claim the port in the interval. It is more robust to let the underlying server bind to port 0 directly and then query the assigned port from the running instance, if the NeMo Gym API supports it.
| with open(path) as f: | ||
| for line in f: | ||
| line = line.strip() | ||
| if line: | ||
| self._rows.append(json.loads(line)) |
What does this PR do?
Integrates NVIDIA NeMo Gym RL Environments, enabling multi-env and custom agent rl on top of verl's Megatron vLLM path.
recipe/nemo_gym/agent_loop.py:NemoGymAgentLoopManageroverridesgenerate_sequences()to offload rollout collection to nemo gym'sRolloutCollectionHelper, includign full multi turn loops, tools, agent harnesses, environment logic, and reward functions. Formats results back into verl'sDataProto.recipe/nemo_gym/dataset.py:NemoGymJSONLDatasetloads NeMo Gym JSONL format into verl's data pipeline.recipe/nemo_gym/server_patch.py: patches vLLM'sOpenAIServingChatandOpenAIServingTokenizationto correct token ID mismatches from chat template retokenization in multi-turn rollouts, matching NeMo RL's approach. Requires [worker, vllm] feat: expose nemo gym token id patch verl#5833 to expose the patch entrypoint on the vLLM server.recipe/nemo_gym/configs/: per-environment config YAMLs specifyingnemo_gym_rootandconfig_paths. Passed viaagent_loop_config_path.recipe/nemo_gym/submit_{math,workplace,multienv}.sh: example Slurm submission scripts.nemo gym agents and environments work through HTTP OpenAI Responses API. Multi-turn token IDs are re-tokenized each turn via chat template. Thus, verl vllm server is patched only for nemo gym path to handle retokenization error correction. Requires verl-project/verl#5833 to apply the patch which is defined here. Agent Gateway may remove the need for this patch. Tool, user, and environment role masking (mask=0 for non assistant, mask=1 for assistant tokens) is applied in postprocessing.
Tested on 9 NeMo Gym Environments including multi-turn and multi-environment, for example:
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main.