Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocate failed due to rpc error: code = Unknown desc = no free node, which is unexpected #191

Open
mingkai-yang opened this issue Jan 10, 2024 · 3 comments

Comments

@mingkai-yang
Copy link

我在集群同一批机器上使用 [单卡共享]、[单卡独占]、[多卡独占] 混合任务时,某些机器上vcua core满足,pod调度成功,但创建失败,gpu-manager 日志在一直提示 no free node,我猜测可能是机器上混合使用不同模式导致的,是不是需要不同模式任务 要做隔离呢?

pod状态:UnexpectedAdmissionError
创建失败的pod事件信息:Allocate failed due to rpc error: code = Unknown desc = no free node, which is unexpected
image
nofreenode

@mingkai-yang
Copy link
Author

谁有遇到过吗

@xiaoertong
Copy link

我也遇到了相同的问题,看日志如下,日志中显示的资源总和是够的
image

@xiaoertong
Copy link

@mingkai-yang 问题解决了吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants