sock.c:344 UCX ERROR recv(fd=47) failed: Connection reset by peer #8911
Unanswered
smallriver666
asked this question in
Q&A
Replies: 1 comment
-
can you pls add |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I encountered this problem when submitting tasks with multiple nodes. I have a management node and five computing nodes. Nodes 2 to 5 are the same server. Node 6 is not the same server as the other four nodes. I am on 2345 four Tasks can be successfully submitted on nodes, and the following error will occur when node 6 is added (I am using openmpi4.1.0)
This is the script I used to submit the task:
Beta Was this translation helpful? Give feedback.
All reactions