[recipe, diffusion] fix: let Ray set Ascend visible devices in Qwen-Image NPU script#227
Open
Sky-Trigger wants to merge 2 commits into
Open
[recipe, diffusion] fix: let Ray set Ascend visible devices in Qwen-Image NPU script#227Sky-Trigger wants to merge 2 commits into
Sky-Trigger wants to merge 2 commits into
Conversation
Remove export of RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES variable. Signed-off-by: Trigger <129651635+Sky-Trigger@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Code Review
This pull request removes the environment variable export RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES=1 from the run_qwen_image_ocr_lora_npu.sh script. There are no review comments to evaluate, and I have no additional feedback to provide on this change.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This PR removes
RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES=1from the Qwen-Image OCR LoRA NPU FlowGRPO example script:With this environment variable set, Ray skips rewriting
ASCEND_RT_VISIBLE_DEVICESfor Ascend/NPU workers. In this FlowGRPO Qwen-Image NPU recipe, this can lead to device placement mismatch across workers.Observed error:
Removing this override allows Ray to manage the per-worker Ascend device visibility normally, which fixes the device mismatch in the NPU FlowGRPO training script.
Why is this needed?
RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES=1is useful in some vLLM-Ascend distributed serving setups where device mapping is handled outside Ray. However, for this verl-omni FlowGRPO training recipe, keeping it enabled prevents Ray from setting the expected per-worker device visibility and may cause tensors to be created on the wrong NPU.This change keeps the example script aligned with Ray-managed NPU placement and avoids the runtime device assertion failure.
Test
Tested by running the Qwen-Image OCR LoRA NPU FlowGRPO example.
Before this change, the script failed with:
After removing
RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES=1, the script can proceed without this device mismatch error.