-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[queue time] Add Job Queue Time hook #6417
Changes from 4 commits
a2ebb6c
3348549
5163d71
0cfcb2a
db624c2
ec7e463
92eeb11
ea030ca
293ea39
ccb5708
7a2bd21
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
name: Update job queue times dataset | ||
|
||
on: | ||
schedule: | ||
# Run every 15 minutes | ||
- cron: "*/15 * * * *" | ||
workflow_dispatch: | ||
|
||
defaults: | ||
run: | ||
working-directory: torchci | ||
jobs: | ||
update-queue-times: | ||
runs-on: ubuntu-24.04 | ||
permissions: | ||
id-token: write | ||
steps: | ||
- uses: actions/checkout@v4 | ||
- run: yarn install --frozen-lockfile | ||
- name: configure aws credentials | ||
id: aws_creds | ||
uses: aws-actions/configure-aws-credentials@v3 | ||
with: | ||
role-to-assume: arn:aws:iam::308535385114:role/gha_workflow_update_queue_times | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FYI I used the same aws role as the update-queue-times.yml. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't do that, each role for each UC. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sonds good |
||
aws-region: us-east-1 | ||
- run: yarn node scripts/updatJobQueueTimes.mjs |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
// We compute queue times by looking at a snapshot of jobs in CI that are | ||
// currently queued and seeing how long they've existed. This approach doesn't | ||
// give us historical data, so write our snapshot regularly to s3 so we can get | ||
// a view of the queue over time. | ||
// this script is used to update the job queue times in s3 bucket for each job. | ||
import { PutObjectCommand, S3Client } from "@aws-sdk/client-s3"; | ||
|
||
export function getS3Client() { | ||
return new S3Client({ | ||
region: "us-east-1", | ||
}); | ||
} | ||
|
||
const s3client = getS3Client(); | ||
|
||
// %7B%7D = encoded {} | ||
const response = await fetch( | ||
"https://hud.pytorch.org/api/clickhouse/queued_jobs?parameters=%7B%7D" | ||
).then((r) => r.json()); | ||
|
||
const unixTime = Math.floor(Date.now() / 1000); | ||
const json_records = response.map((item) => JSON.stringify(item)).join("\n"); | ||
|
||
s3client.send( | ||
new PutObjectCommand({ | ||
Bucket: "ossci-raw-job-status", | ||
Key: `job_queue_times_historical/${unixTime}.txt`, | ||
Body: json_records, | ||
}) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is a good standard to use GHA CI for chron jobs, I believe we should be using lambdas or other things for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious, is there reason why the old one exists?