Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BE][drci] Async fetch PR info during reorganizeWorkflows for speed #6380

Merged
merged 5 commits into from
Mar 18, 2025

Conversation

clee2000
Copy link
Contributor

@clee2000 clee2000 commented Mar 7, 2025

Not sure if github is good enough at showing the diff here since the highlighting what is indentation is helpful for seeing what actually changed. My vscode changed files is more helpful

Instead of doing a synchronous for loop, use async to reduce the time needed to wait for api calls.

This reduces the time this function takes from ~45s -> 8s (obviously depends on how many PRs, but I assume this is pretty representative of calling drci on pytorch/pytorch during the workday). Waiting for the 45s is kind of annoying when trying to test changes to dr ci

I tested that the result of this didn't change by comparing the results of of the output of the function

Copy link

vercel bot commented Mar 7, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
torchci ✅ Ready (Inspect) Visit Preview Mar 7, 2025 7:40pm

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 7, 2025
@clee2000 clee2000 marked this pull request as ready for review March 7, 2025 19:43
@clee2000 clee2000 requested a review from a team March 7, 2025 19:43
@clee2000 clee2000 changed the title [drci] Async fetch PR info during reorganizeWorkflows to make func faster [BE][drci] Async fetch PR info during reorganizeWorkflows to make func faster Mar 7, 2025
@clee2000 clee2000 changed the title [BE][drci] Async fetch PR info during reorganizeWorkflows to make func faster [BE][drci] Async fetch PR info during reorganizeWorkflows for speed Mar 12, 2025

for (const workflow of dedupedRecentWorkflows) {
const prNumber = workflow.pr_number;
if (!workflowsByPR.has(prNumber)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this version check for the prNumber and only apply the logic to the entries missing it, while the new version doesn't need it and runs the code against all entries?

Copy link
Contributor Author

@clee2000 clee2000 Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR regroups and reorders things so that the slow fetches can happen outside of the sequential for loop. The same number of api calls should happen, they just happen at a different time. Here's some pseudo code that kinda explains whats going on

Goal: construct map<pr number, pr info and job list>
Old:

for each workflow (sequential because of for loop without async modifier):
  add to map
  if workflow's pr hasn't been seen yet (new key), fetch pr info and maybe info about the commit (slow)

New:

for each workflow (sequential):
  add to map
for each key (pr number) in map (technically also sequential, but the mapped function is async, creating promises which are all awaited together):
  fetch pr info and maybe info about commit 

Copy link
Contributor

@huydhn huydhn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if you see the improvement here. I thought that using await fetchCommitTimestamp and await fetchPR should already fetch PR asynchronously, the .map() part then runs these request concurrently cutting down the delay. I usually test Dr.CI with a specific PR to make this run faster, i.e. http://localhost:3000/api/drci/drci?prNumber=123326

@clee2000 clee2000 merged commit 17536fd into main Mar 18, 2025
6 checks passed
@clee2000 clee2000 deleted the csl/drci_reduce_time_for_reorgworkflows branch March 18, 2025 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants