Skip to content

How to add a new custom table on ClickHouse

Huy Do edited this page Nov 15, 2024 · 24 revisions

First time login

Skip this part if you already have access to PyTorch Dev Infra ClickHouse cluster on https://console.clickhouse.cloud

For metamates, goto https://console.clickhouse.cloud and login with your Meta email. The portal uses SSO, so you just need to follow the step on your browser to request access. We grant read-only access by default.

Prepare the schema and create the table

The first thing to do is to take a look at https://clickhouse.com/docs/en/sql-reference/statements/create/table to get familiar with CH SQL syntax. Note that there are several available databases on our cluster, some of the most important ones are:

  • The default database that includes all GitHub events.
  • The benchmark database for all benchmark and metrics data.

If you need a new database, again please create an issue and book an OH with us at https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours.

All ClickHouse tables are backed by either S3 for immutable records like metrics or DynamoDB for mutable records like GitHub events. In both cases, you will need to:

  1. Write down your CREATE TABLE query with the schema and submit it for review, i.e. https://github.com/pytorch/test-infra/pull/5839. Once it's approved, you can create the table yourselves using CH cloud console if you have the necessary permission, or ping the reviewer to create it for you.
  2. For immutable records on S3, make sure that the workflow that uploads the data has the permission to do so. It usually means a 3-step process:
    1. Consults with PyTorch Dev Infra if you need a new S3 bucket and submit a PR to create one at https://github.com/pytorch-labs/pytorch-gha-infra/blob/main/runners/s3_bucket.tf if needed.
    2. Submit a PR to create a new OIDC role or edit an existing one to grant the write permission, i.e. https://github.com/pytorch-labs/pytorch-gha-infra/pull/358.
    3. Use the new role in your workflow, i.e. https://github.com/pytorch/executorch/pull/2449.
  3. For mutable records on DynamoDB,