-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BE][drci] Async fetch PR info during reorganizeWorkflows for speed #6380
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
|
||
for (const workflow of dedupedRecentWorkflows) { | ||
const prNumber = workflow.pr_number; | ||
if (!workflowsByPR.has(prNumber)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this version check for the prNumber and only apply the logic to the entries missing it, while the new version doesn't need it and runs the code against all entries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR regroups and reorders things so that the slow fetches can happen outside of the sequential for loop. The same number of api calls should happen, they just happen at a different time. Here's some pseudo code that kinda explains whats going on
Goal: construct map<pr number, pr info and job list>
Old:
for each workflow (sequential because of for loop without async modifier):
add to map
if workflow's pr hasn't been seen yet (new key), fetch pr info and maybe info about the commit (slow)
New:
for each workflow (sequential):
add to map
for each key (pr number) in map (technically also sequential, but the mapped function is async, creating promises which are all awaited together):
fetch pr info and maybe info about commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if you see the improvement here. I thought that using await fetchCommitTimestamp
and await fetchPR
should already fetch PR asynchronously, the .map()
part then runs these request concurrently cutting down the delay. I usually test Dr.CI with a specific PR to make this run faster, i.e. http://localhost:3000/api/drci/drci?prNumber=123326
Not sure if github is good enough at showing the diff here since the highlighting what is indentation is helpful for seeing what actually changed. My vscode changed files is more helpful
Instead of doing a synchronous for loop, use async to reduce the time needed to wait for api calls.
This reduces the time this function takes from ~45s -> 8s (obviously depends on how many PRs, but I assume this is pretty representative of calling drci on pytorch/pytorch during the workday). Waiting for the 45s is kind of annoying when trying to test changes to dr ci
I tested that the result of this didn't change by comparing the results of of the output of the function