-
Notifications
You must be signed in to change notification settings - Fork 90
How to add a new custom table on ClickHouse
Skip this part if you already have access to PyTorch Dev Infra ClickHouse cluster on https://console.clickhouse.cloud
For metamates, goto https://console.clickhouse.cloud and login with your Meta email. The portal uses SSO, so you just need to follow the step on your browser to request access. We grant read-only access by default.
The first thing to do is to take a look at https://clickhouse.com/docs/en/sql-reference/statements/create/table to get familiar with CH SQL syntax. Note that there are several available databases on our cluster, some of the most important ones are:
- The
default
database that includes all GitHub events. - The
benchmark
database for all benchmark and metrics data.
If you need a new database, again please create an issue and book an OH with us at https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours.
All ClickHouse tables are backed by either S3 for immutable records like metrics or DynamoDB for mutable records like GitHub events. In both cases, you will need to:
- Write down your
CREATE TABLE
query with the schema and submit it for review, i.e. https://github.com/pytorch/test-infra/pull/5839. Once it's approved, you can create the table yourselves using CH cloud console if you have the necessary permission, or ping the reviewer to create it for you. - For immutable records on S3, make sure that the workflow that uploads the data has the permission to do so. It usually means a 3-step process:
- Consults with PyTorch Dev Infra if you need a new S3 bucket and submit a PR to create one at https://github.com/pytorch-labs/pytorch-gha-infra/blob/main/runners/s3_bucket.tf if needed.
- Submit a PR to create a new OIDC role or edit an existing one to grant the write permission, i.e. https://github.com/pytorch-labs/pytorch-gha-infra/pull/358.
- Use the new role in your workflow, i.e. https://github.com/pytorch/executorch/pull/2449.
- For mutable records on DynamoDB,