alter_table: Durable Catalog Migration #30163

ParkMyCar · 2024-10-23T14:27:37Z

This PR re-keys everything in the durable Catalog on CatalogItemId instead of GlobalId, and shims everything above the durable Catalog objects using two new methods GlobalId::to_item_id(), and CatalogItemId::to_global_id().

Now every item in the Catalog has the following fields:

id: CatalogItemId, a stable external identifier (i.e. in Catalog tables like mz_objects) that is the same for the entire lifetime of the object.
global_id: GlobalId, a stable internal identifier for this object that can be used by storage and compute.
extra_versions: BTreeMap<Version, GlobalId>, mapping of versions of an object to the GlobalIds used by compute and storage to refer to a specific version.

This de-coupling of CatalogItemId and GlobalId achieves two things:

Externally objects have a stable identifier, even as they are ALTER-ed. This is required for external tools like dbt and Terraform that track objects by ID.
Internally a GlobalId always refers to the same RelationDesc + Persist Shard, this maintains the concept from the formalism that a GlobalId is never re-assigned to a new pTVC.

The implementation of ALTER TABLE ... ADD COLUMN ... will thus allocate a new GlobalId which will immutably refer to that specific version of the table.

Other Changes

Along with ItemKey and ItemValue I updated the following Catalog types:

GidMappingValue: replaced the id field with catalog_id and global_id, used to identify builtin catalog objects.
ClusterIntrospectionSourceIndexValue: replaced the index_id field with catalog_id and global_id, used to identify builtin introspection source indexes.
CommentKey: replaced GlobalId with CatalogItemId, used to identify comments on objects.
SourceReferencesKey: replaced GlobalId with CatalogItemId, used to track references between a Source and the subsources/tables that read from it.

Partial Progress

Today CatalogItemId is 1:1 with GlobalId, this allows us to implement the to_item_id and to_global_id shim methods. Until we support ALTER TABLE ... ADD COLUMN ... we can freely convert between the two. This allows us to break this change up among multiple PRs instead of a single massive change.

Initial Migration

Because CatalogItemId and GlobalId are currently 1:1, this allows us to migrate the raw values of the IDs, e.g. GlobalId::User(42) becomes CatalogItemId::User(42), which is exactly what this PR does.

Motivation

Progress towards https://github.com/MaterializeInc/database-issues/issues/8233
Implements changes described in #30019

Tips for reviewer

This PR is split into two commits:

The durable catalog migration, and updates to durable Catalog objects. I would appreciate the most thorough reviews on this commit.
Shimming all calling code to convert between CatalogItemId and GlobalId.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

shepherdlybot · 2024-10-23T16:51:12Z

Mitigations

Completing required mitigations increases Resilience Coverage.

Risk Summary:

The pull request has a high-risk score of 83, driven by predictors such as the average line count in files and executable lines within files. Historically, PRs with these predictors are 160% more likely to cause a bug than the repository baseline. Additionally, the repository has shown an increasing trend with recent spikes in bug fixes.

Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity.

Bug Hotspots:
What's This?

File	Percentile
../memory/objects.rs	99
../statement/ddl.rs	99
../catalog/state.rs	95

jkosh44

LGTM

jkosh44 · 2024-10-24T13:43:57Z

src/catalog/src/durable/objects.rs

    pub schema_id: SchemaId,
    pub name: String,
    pub create_sql: String,
    pub owner_id: RoleId,
    pub privileges: Vec<MzAclItem>,
+    pub extra_versions: BTreeMap<RelationVersion, GlobalId>,


I'm a little confused how this will be populated. Will we have a single Item entry in the durable catalog per CatalogItemId or a single entry per GlobalId?

Will we have a single Item entry in the durable catalog per CatalogItemId or a single entry per GlobalId?

There will be a single Item entry in the durable catalog per CatalogItemId. Each Item can then also be optionally referred to by multiple GlobalIds, where each GlobalId refers to a specific version of the Item.

So the global_id field will be the most recent GlobalId and everything else will be in extra_versions? Or will the most recent GlobalId also be in extra_versions?

Great question! Right now what I was thinking is the original GlobalId will be in the global_id field, i.e. Version(0). Version(1) and above is what goes into the extra_versions map

jkosh44 · 2024-10-24T13:54:16Z

src/sql/src/names.rs

+            | CommentObjectId::Connection(item_id)
+            | CommentObjectId::Type(item_id)
+            | CommentObjectId::Secret(item_id)
+            | CommentObjectId::ContinualTask(item_id) => ObjectId::Item(item_id.to_global_id()),


I think we probably want to use CatalogItemId here instead of GlobalId

Totally! That switch over will be in the next PR

* bump the version of the catalog to v68, add CatalogItemId * add migration from v67 to v68

* add GlobalId::to_item_id and CatalogItemId::to_global_id methods * shim everything to interop between the two

…istEpoch(Some(4)) Seen in https://buildkite.com/materialize/test/builds/93379 I'm guessing this might have started with MaterializeInc#30163

This PR switches the Adapter/Catalog from referencing items with a `GlobalId`, to a `CatalogItemId`. Internally the compute and storage layers of Materialize will still reference objects by their `GlobalId`, but conceptually a `GlobalId` refers to a "collection of data", while a `CatalogItemId` refers to the database object. This switch allows us to associate multiple `GlobalId`s (aka "collections of data") with a single Catalog object. The primary motivator for this is supporting live schema migrations, or `ALTER TABLE ... ADD COLUMN ...`. When adding a new column to a table we'll create a new `GlobalId`, this ensures that the schema (aka `RelationDesc`) associated with a `GlobalId` never changes. ### Design The durable Catalog migration to add `CatalogItemId` took place in #30163. What took a bit of thought is where to place the associated `GlobalId`s for each object, we have the following requirements: * `Table`: multiple `GlobalId`s * `Source`: single `GlobalId` (maybe eventually multiple?) * `Log`: single `GlobalId` * `View`: single `GlobalId` * `MaterializedView`: single `GlobalId` * `ContinualTask`: single `GlobalId` * `Sink`: single `GlobalId` * `Index`: single `GlobalId` * `Type`: no `GlobalId`s * `Func`: no `GlobalId`s * `Secret`: no `GlobalId`s * `Connection`: no `GlobalId`s `Type`, `Func`, `Secret`, and `Connection` are never referenced by the compute and storage layer so they don't need a `GlobalId`. Meanwhile `Table`s can be evolved so they need to support multiple `GlobalId`s. What complicates this a bit is how we use the `create_sql` for an object as our durable record, while we could come up with some syntax like `CREATE TABLE [<name> WITH u1 u2 u3] ...` as a way to associate multiple `GlobalId`s with an object, this didn't feel great. What I landed on was persisting a single `GlobalId` for all objects, and then including an `extra_versions: BTreeMap<Version, GlobalId>` at the durable layer. While not all types need a `GlobalId`, associating a single `GlobalId` for each object makes things easier to reason about. And while I would like to "design away invalid states", trying to associate multiple `GlobalId`s with only tables added a bunch of boilerplate which I didn't think was worth it. But at the in-memory layer is where we enforce only `Table`s can have multiple `GlobalId`s. Instead of sticking a `global_id: GlobalId` field on CatalogEntry, which might feel natural, I put them on the inner `enum CatalogItem` variants. If a caller has just a `CatalogEntry`, they're forced to reason about their item possibly having multiple `GlobalId`s associated with it. But if they match on the inner `CatalogItem` they can get the inner object and generally get the single `GlobalId`, or if it's a table, reason about which `GlobalId` to use. ### Motivation Implements the approach described in: #30019 Progress towards: MaterializeInc/database-issues#8233 ### Tips for reviewer I tried splitting up this PR into multiple smaller ones, but the amount of shimming required to convert back and forth between `GlobalId` and `CatalogItemId` made the changes very noisy. As such, I split this single PR into multiple commits where each commit migrates a single logical code path. Commits that require the most attention are annotated with a ⭐, and tagged by the teams most directly impacted. 1. ⭐ [storage] Migrates code paths for `Connection`s and `Secret`s to use `CatalogItemId`. This doesn't require much attention, it's largely just a find and replace since these objects are only ever referenced by the Catalog. * Something to call out though is this does change Protobuf definitions which AFAIK are not durably persisted anywhere. But someone from Storage should double check me. 2. ⭐ [adapter] This updates our in-memory objects to store a `GlobalId` per-object, for `Table`s we store multiple `GlobalId`s. To review this commit I would start by looking at `src/catalog/src/memory/objects.rs` to get an understanding of where `GlobalId`s live and how the individual objects were updated. It also updates our durable Catalog transactions to remove the shims introduced in #30163. What's subtle here is we also change the `allocate_user_item_ids(...)` API to return _both_ a `CatalogItemId` and `GlobalId` with the same inner value. This ensures that that string representation of a `CatalogItemId` for an object will always be the same as the `GlobalId`, this allows us to sidestep the issue for now of updating builtin tables to map between `CatalogItemId` and `GlobalId` 3. ⭐ [adapter] Migrates the builtin objects codepaths to associate both a `CatalogItemId` and a `GlobalId` with a builtin object. Builtin objects are handled separately from normal objects, hence having to migrate the separate code path. 4. ⭐ [adapter] Changes SQL name resolution to use `CatalogItemId`. It also changes the `ResolvedIds` newtype from a `BTreeSet<GlobalId>` to a `BTreeMap<CatalogItemId, BTreeSet<GlobalId>>`. This way after name resolution we not only know what objects a statement refers to, but also their underlying collections. This also makes the inner field of `ResolvedIds` private and migrates its callsites. 5. ⭐ [adapter] Adds new APIs to the Catalog to get items by their associated `GlobalId`, and adds a new trait called `CatalogCollectionItem`. I would start reviewing this commit by looking at `src/sql/src/catalog.rs` to get an idea of the API surface. * The new `CatalogCollectionItem` is really only necessary to support `ALTER TABLE` and isn't strictly necessary for this change, but I found that it adds context to the change. The `desc(...)` method is removed from `trait CatalogItem` and moved to `trait CatalogCollectionItem`. You can only get a `CatalogCollectionItem` (and thus `RelationDesc`) if you have a `GlobalId` or a `CatalogItemId` + `Version`. 6. [compute] Updates the `trait OptimizerCatalog` to allow getting an item by a `CatalogItemId`, and updates implementations to use new Catalog APIs. This commit is small and only needs to be skimmed. 7. [adapter] Introduces a new `DependencyIds` field on some Catalog items. `ResolvedIds` are derived from name resolution, meanwhile `DependencyIds` are derived from the `HirRelationExpr` created after planning. * This was added because the `CatalogEntry::uses` method wants to return a list of `CatalogItemId`s, but the `HirRelationExpr` only knows about `GlobalId`s. We could map from `GlobalId` -> `CatalogItemId` when calling the method, but this was a decent refactor on it's own, so I opted to cache these dependencies during planning on the few types of objects that need them. 8. [adapter] Migrates the Coordinator startup/bootstrap codepath to use the new APIs introduced in [5]. No new logic is introduced here, it's largely just a find and replace like `entry.id()` to `mat_view.global_id()`. 9. ⭐ [adapter] Updates all create plans (i.e. `PlanCreate*`) to associate a `GlobalId` with each item. 10. [adapter] Updates plans for `COPY` and `INSERT` to use `CatalogItemId` instead of `GlobalId`, since we also want to insert into collections at their latest version. This commit is largely a find and replace and can be skimmed. 11. [adapter] Updates all drop plans (i.e. `PlanDrop*`) to use `CatalogItemId`, this is pretty much a find and replace and can be skimmed. 12. [adapter] Updates all alter plans (i.e. `PlanAlter*`) to use `CatalogItemId`, this is pretty much a find and replace and can be skimmed. 13. [adapter] Updates all code paths for `COMMENT ON` and Webhook Sources to use `CatalogItemId`. This is a fairly small commit and can definitely be skimmed. 14. ⭐ [adapter/compute]Updates all Peek and Subscribe plans to use `CatalogItemId`. This commit is interesting and could use some extra thought since it's where we map from a catalog item to it's collection, and is one of the primary motivators of this work. 15. [adapter] Updates all `SHOW` and `INSPECT` plans to use `CatalogItemId`. This commit is quite small and can be skimmed. 16. [adapter] Updates Catalog Consistency Checks and `PlanValidity` to use `CatalogItemId`. This commit is quite small and can be skimmed. 17. [adapter] Updates `EXPLAIN` plans to use the new `GlobalId`-based Catalog APIs. It's pretty straight forward and probably doesn't require too much attention. 18. [adapter] This commit is a bit of a "grab bag", it updates code paths for auto-routing queries to `mz_catalog_server`, cluster scheduling, and "caught up" checks to map between `CatalogItemId` and `GlobalId`. It could use a bit of extra attention. 19. ⭐ [adapter/persist] Updates `ScalarType`s to use `CatalogItemId` instead of `GlobalId`. * IMO this is one of the sketchiest commits of the entire PR because we durably persist `ScalarType`s in the Persist schema registry. This change should be okay though since the protobuf type of `CatalogItemId` is exactly the same as `GlobalId` (minus the `Explain` variant) so it is wire compatible. 20. [adapter] Migrates all builtin table writes (aka the `pack_*` method`) to use `CatalogItemId`. 21. Deletes the `GlobalId::to_item_id` and `CatalogItemId::to_global_id` shim methods. ### Checklist - [x] This PR has adequate test coverage / QA involvement has been duly considered. ([trigger-ci for additional test/nightly runs](https://trigger-ci.dev.materialize.com/)) - [x] This PR has an associated up-to-date [design doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md), is a design doc ([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)), or is sufficiently small to not require a design.  - [x] If this PR evolves [an existing `$T ⇔ Proto$T` mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md) (possibly in a backwards-incompatible way), then it is tagged with a `T-proto` label. - [x] If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label ([example](MaterializeInc/cloud#5021)).  - [x] If this PR includes major [user-facing behavior changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note), I have pinged the relevant PM to schedule a changelog post.

ParkMyCar force-pushed the alter_table2/durable-catalog-migration branch from f4c498b to 7aee756 Compare October 23, 2024 15:24

ParkMyCar marked this pull request as ready for review October 23, 2024 16:50

ParkMyCar requested review from a team as code owners October 23, 2024 16:50

ParkMyCar requested a review from jkosh44 October 23, 2024 16:50

jkosh44 approved these changes Oct 24, 2024

View reviewed changes

ParkMyCar added 2 commits October 24, 2024 16:18

start, durable catalog migration

caf77e3

* bump the version of the catalog to v68, add CatalogItemId * add migration from v67 to v68

shim relevant parts

8b0b81c

* add GlobalId::to_item_id and CatalogItemId::to_global_id methods * shim everything to interop between the two

ParkMyCar force-pushed the alter_table2/durable-catalog-migration branch from 7aee756 to 8b0b81c Compare October 24, 2024 20:46

ParkMyCar added 2 commits October 25, 2024 10:04

update GidMappingValue protobuf type to account for 0dt upgrades

2d2dd9d

update the v68 snapshot

99631f3

ParkMyCar enabled auto-merge (squash) October 25, 2024 14:45

ParkMyCar disabled auto-merge October 25, 2024 15:13

one more workaround for 0dt

a4653d6

ParkMyCar enabled auto-merge (squash) October 25, 2024 15:34

ParkMyCar merged commit 1945b14 into MaterializeInc:main Oct 25, 2024
80 checks passed

def- mentioned this pull request Oct 26, 2024

ci: Ignore halts: fenced by envd @ PersistEpoch(Some(6)). ours = PersistEpoch(Some(4)) #30206

Merged

5 tasks

ParkMyCar mentioned this pull request Oct 28, 2024

adapter: Switch from GlobalId to CatalogItemId #30189

Merged

5 tasks

ParkMyCar deleted the alter_table2/durable-catalog-migration branch November 18, 2024 16:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alter_table: Durable Catalog Migration #30163

alter_table: Durable Catalog Migration #30163

ParkMyCar commented Oct 23, 2024 •

edited

Loading

shepherdlybot bot commented Oct 23, 2024 •

edited

Loading

jkosh44 left a comment

jkosh44 Oct 24, 2024

ParkMyCar Oct 24, 2024

jkosh44 Oct 24, 2024

ParkMyCar Oct 25, 2024

jkosh44 Oct 24, 2024

ParkMyCar Oct 24, 2024

alter_table: Durable Catalog Migration #30163

alter_table: Durable Catalog Migration #30163

Conversation

ParkMyCar commented Oct 23, 2024 • edited Loading

Other Changes

Partial Progress

Initial Migration

Motivation

Tips for reviewer

Checklist

shepherdlybot bot commented Oct 23, 2024 • edited Loading

Mitigations

jkosh44 left a comment

Choose a reason for hiding this comment

jkosh44 Oct 24, 2024

Choose a reason for hiding this comment

ParkMyCar Oct 24, 2024

Choose a reason for hiding this comment

jkosh44 Oct 24, 2024

Choose a reason for hiding this comment

ParkMyCar Oct 25, 2024

Choose a reason for hiding this comment

jkosh44 Oct 24, 2024

Choose a reason for hiding this comment

ParkMyCar Oct 24, 2024

Choose a reason for hiding this comment

ParkMyCar commented Oct 23, 2024 •

edited

Loading

shepherdlybot bot commented Oct 23, 2024 •

edited

Loading