Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scaleupchron job for queued runners #6389

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Conversation

Camyll
Copy link
Contributor

@Camyll Camyll commented Mar 11, 2025

Addresses issue pytorch/pytorch#143041

Adds scaleupchron job to check queue for jobs that have been queued for long periods of time. Directly calls scale up for them

cherry picked from Zain's original PR #6018

Copy link

vercel bot commented Mar 11, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Updated (UTC)
torchci ⬜️ Ignored (Inspect) Visit Preview Mar 11, 2025 6:12pm

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 11, 2025
SCALE_CONFIG_REPO = var.scale_config_repo
SCALE_CONFIG_REPO_PATH = var.scale_config_repo_path
SCALE_UP_MIN_QUEUE_TIME_MINUTES = 30
SCALE_UP_RECORD_QUEUE_URL = "https://hud.pytorch.org/api/clickhouse/queued_jobs_aggregate?parameters=%5B%5D"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: need to get actual URL from Zain

@Camyll Camyll force-pushed the camyllh/scale_up_chron branch from a4d879b to 1ab676a Compare March 11, 2025 17:49
@Camyll Camyll force-pushed the camyllh/scale_up_chron branch from 1ab676a to d53b9bc Compare March 11, 2025 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants