Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions skyrl/train/utils/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -605,6 +605,9 @@ def prepare_runtime_environment(cfg: SkyRLTrainConfig) -> dict[str, str]:
# TODO(sumanthrh): introduce a debug mode and add debugging flags like `CUDA_LAUNCH_BLOCKING` here
env_vars = {}

# manually set this for testing everywhere
env_vars["VLLM_USE_RAY_V2_EXECUTOR_BACKEND"] = "1"
Comment on lines +608 to +609
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding VLLM_USE_RAY_V2_EXECUTOR_BACKEND to "1" unconditionally prevents users from overriding this setting via environment variables. It is better to check if the variable is already set in os.environ and only apply the default if it is missing. This ensures that users can explicitly disable the V2 executor if they encounter issues. Additionally, since this is a vLLM-specific setting, it would ideally be placed within the vLLM backend check block (around line 629) to avoid polluting the environment for other backends.

Suggested change
# manually set this for testing everywhere
env_vars["VLLM_USE_RAY_V2_EXECUTOR_BACKEND"] = "1"
# Use Ray V2 executor for vLLM by default, but allow override from environment
env_vars["VLLM_USE_RAY_V2_EXECUTOR_BACKEND"] = os.environ.get("VLLM_USE_RAY_V2_EXECUTOR_BACKEND", "1")


# NOTE (erictang000): This should no longer be required since this has been removed in vllm
# and fixed in NCCL (https://github.com/vllm-project/vllm/pull/24141, https://github.com/NVIDIA/nccl/issues/1234), but empirically seeing OOMs for
# that previously ran successfully, so keeping this to maintain backwards compatibility.
Expand Down
1 change: 1 addition & 0 deletions tests/backends/skyrl_train/gpu/gpu_ci/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ def _build_ray_env_vars():
"VLLM_USE_V1": "1",
"VLLM_ENABLE_V1_MULTIPROCESSING": "0",
"VLLM_ALLOW_INSECURE_SERIALIZATION": "1",
"VLLM_USE_RAY_V2_EXECUTOR_BACKEND": "1",
"_SKYRL_USE_NEW_INFERENCE": "1" if _SKYRL_USE_NEW_INFERENCE else "0",
}

Expand Down
Loading