We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我在集群同一批机器上使用 [单卡共享]、[单卡独占]、[多卡独占] 混合任务时,某些机器上vcua core满足,pod调度成功,但创建失败,gpu-manager 日志在一直提示 no free node,我猜测可能是机器上混合使用不同模式导致的,是不是需要不同模式任务 要做隔离呢?
pod状态:UnexpectedAdmissionError 创建失败的pod事件信息:Allocate failed due to rpc error: code = Unknown desc = no free node, which is unexpected
The text was updated successfully, but these errors were encountered:
谁有遇到过吗
Sorry, something went wrong.
我也遇到了相同的问题,看日志如下,日志中显示的资源总和是够的
@mingkai-yang 问题解决了吗?
No branches or pull requests
我在集群同一批机器上使用 [单卡共享]、[单卡独占]、[多卡独占] 混合任务时,某些机器上vcua core满足,pod调度成功,但创建失败,gpu-manager 日志在一直提示 no free node,我猜测可能是机器上混合使用不同模式导致的,是不是需要不同模式任务 要做隔离呢?
pod状态:UnexpectedAdmissionError
创建失败的pod事件信息:Allocate failed due to rpc error: code = Unknown desc = no free node, which is unexpected
The text was updated successfully, but these errors were encountered: