-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve copy promises #2038
base: master
Are you sure you want to change the base?
Resolve copy promises #2038
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sjawhar The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Welcome @sjawhar! |
We gotta stop using `echo` and the like for score logs, it can get too big and need to use stdin and files. Details: * Well, k8s javascript client [has a bug](kubernetes-client/javascript#2038), so I re-implemented a fixed version of it. * Tested it out locally using kind. So I had to make the k8s setup work with non-EKS clusters. * Also documented how to set up local k8s development environment while I was at it Testing: * the automated tests honestly aren't great here. Would feel safer having integration tests against an actual k8s cluster * But here's a screenshot showing a working run, which requires copying `settings.json` into the pod <img width="1231" alt="image" src="https://github.com/user-attachments/assets/07512016-fa9e-4d7a-953a-c6a0445c32fb"> * I also tested that I was able to copy a large score log that broke the previous version of the function * Here's a task test ![image](https://github.com/user-attachments/assets/cc3dd29d-a266-4de8-b29a-1d85a37c147b) * Test of a big score log <img width="1851" alt="image" src="https://github.com/user-attachments/assets/a450fdf1-a375-40fd-bcc1-20c4c438698b">
Hey @sjawhar thanks for your pr but I think the tests are broken, could you have a look? |
src/cp.ts
Outdated
@@ -66,6 +66,11 @@ export class Cp { | |||
} | |||
}, | |||
) | |||
.then((conn) => { | |||
conn.onclose = (event) => { | |||
resolve(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a resolve above at line 63. I don't think that this resolve is required. In fact I'm not sure it's what we want, because this will resolve when the websocket is returned. We want to resolve when the copy completes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That callback only gets called if status != null
:
Lines 53 to 62 in 8db906a
const conn = await this.handler.connect(path, null, (streamNum: number, buff: Buffer): boolean => { | |
const status = WebSocketHandler.handleStandardStreams(streamNum, buff, stdout, stderr); | |
if (status != null) { | |
if (statusCallback) { | |
statusCallback(status); | |
} | |
return false; | |
} | |
return true; | |
}); |
For copies (or maybe for all exec
calls that use stdin
), status == null
, so the callback never gets called, so the promise never resolves. await
ing calls to cpToPod
hang forever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the copy isn't complete when the websocket connection is closed, how can the caller ever know that it has completed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed that you are doing this on the 'close' event. My mistake. But we should probably make sure that we don't resolve the promise twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, added a check for that in a way that shouldn't cause things to break if changes elsewhere cause this logic to change.
Yeah, sorry about that, I had quickly made the edits in the browser to make sure I didn't forget. Now I've actually, like, tested it and stuff 😅 |
Is there a test that can be added for this? |
I've spend some time on it but didn't figure out the different mocks well enough to understand why the tests were passing before instead of timing out as expected. The lines in the test that send a If it helps at all, I've read through the code and set breakpoints to interactively debug and this the issue largely comes down to this: javascript/src/web-socket-handler.ts Lines 55 to 73 in 88da3bd
It looks to me like when stdin is provided, the websocket is closed without waiting for a |
The promise returned by
cpToPod
never resolves. Followed the example from here to fix: #982 (comment)