Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[queue time] Add Job Queue Time hook #6417

Closed
wants to merge 11 commits into from
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions .github/workflows/update-job-queue-times.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: Update job queue times dataset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is a good standard to use GHA CI for chron jobs, I believe we should be using lambdas or other things for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, is there reason why the old one exists?


on:
schedule:
# Run every 15 minutes
- cron: "*/15 * * * *"
workflow_dispatch:

defaults:
run:
working-directory: torchci
jobs:
update-queue-times:
runs-on: ubuntu-24.04
permissions:
id-token: write
steps:
- uses: actions/checkout@v4
- run: yarn install --frozen-lockfile
- name: configure aws credentials
id: aws_creds
uses: aws-actions/configure-aws-credentials@v3
with:
role-to-assume: arn:aws:iam::308535385114:role/gha_workflow_update_queue_times
Copy link
Contributor Author

@yangw-dev yangw-dev Mar 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I used the same aws role as the update-queue-times.yml.
@jeanschmidt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't do that, each role for each UC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sonds good

aws-region: us-east-1
- run: yarn node scripts/updatJobQueueTimes.mjs
3 changes: 2 additions & 1 deletion torchci/clickhouse_queries/queued_jobs/query.sql
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@ SELECT
job.created_at,
CURRENT_TIMESTAMP()
) AS queue_s,
CONCAT(workflow.name, ' / ', job.name) AS name,
workflow.name AS workflow_name,
job.name AS job_name,
job.html_url,
IF(
LENGTH(job.labels) = 0,
Expand Down
10 changes: 9 additions & 1 deletion torchci/components/metrics/panels/TablePanel.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ export default function TablePanel({
helpLink,
// An optional flag to show the table footer
showFooter,
// A custom function to modify the data before it is passed to table rendering
dataModifier,
}: {
title: string;
queryName: string;
Expand All @@ -29,6 +31,7 @@ export default function TablePanel({
dataGridProps: any;
helpLink?: string;
showFooter?: boolean;
dataModifier?: (data: any) => any;
}) {
const url = `/api/clickhouse/${queryName}?parameters=${encodeURIComponent(
JSON.stringify(queryParams)
Expand All @@ -38,10 +41,15 @@ export default function TablePanel({
refreshInterval: 5 * 60 * 1000, // refresh every 5 minutes
});

let inputData = data;
if (dataModifier !== undefined && data != undefined) {
inputData = dataModifier(data);
}

return (
<TablePanelWithData
title={title}
data={data}
data={inputData}
columns={columns}
dataGridProps={dataGridProps}
helpLink={helpLink}
Expand Down
6 changes: 6 additions & 0 deletions torchci/pages/metrics.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -778,6 +778,12 @@ export default function Page() {
},
getRowId: (el: any) => el.html_url,
}}
dataModifier={(data: any[]) => {
return data.map((el) => {
el.name = el.workflow_name + "/" + el.job_name;
return el;
});
}}
/>
</Grid2>

Expand Down
30 changes: 30 additions & 0 deletions torchci/scripts/updateJobQueueTimes.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
// We compute queue times by looking at a snapshot of jobs in CI that are
// currently queued and seeing how long they've existed. This approach doesn't
// give us historical data, so write our snapshot regularly to s3 so we can get
// a view of the queue over time.
// this script is used to update the job queue times in s3 bucket for each job.
import { PutObjectCommand, S3Client } from "@aws-sdk/client-s3";

export function getS3Client() {
return new S3Client({
region: "us-east-1",
});
}

const s3client = getS3Client();

// %7B%7D = encoded {}
const response = await fetch(
"https://hud.pytorch.org/api/clickhouse/queued_jobs?parameters=%7B%7D"
).then((r) => r.json());

const unixTime = Math.floor(Date.now() / 1000);
const json_records = response.map((item) => JSON.stringify(item)).join("\n");

s3client.send(
new PutObjectCommand({
Bucket: "ossci-raw-job-status",
Key: `job_queue_times_historical/${unixTime}.txt`,
Body: json_records,
})
);