Skip to content

Commit 1945b14

Browse files
authored
alter_table: Durable Catalog Migration (#30163)
This PR re-keys everything in the durable Catalog on `CatalogItemId` instead of `GlobalId`, and shims everything above the durable Catalog objects using two new methods `GlobalId::to_item_id()`, and `CatalogItemId::to_global_id()`. Now every item in the Catalog has the following fields: * `id: CatalogItemId`, a stable _external_ identifier (i.e. in Catalog tables like `mz_objects`) that is the same for the entire lifetime of the object. * `global_id: GlobalId`, a stable _internal_ identifier for this object that can be used by storage and compute. * `extra_versions: BTreeMap<Version, GlobalId>`, mapping of versions of an object to the `GlobalId`s used by compute and storage to refer to a specific version. This de-coupling of `CatalogItemId` and `GlobalId` achieves two things: 1. Externally objects have a stable identifier, even as they are `ALTER`-ed. This is required for external tools like dbt and Terraform that track objects by ID. 2. Internally a `GlobalId` always refers to the same `RelationDesc` + Persist Shard, this maintains the concept from the formalism that a `GlobalId` is never re-assigned to a new pTVC. The implementation of `ALTER TABLE ... ADD COLUMN ...` will thus allocate a new `GlobalId` which will immutably refer to that specific version of the table. #### Other Changes Along with `ItemKey` and `ItemValue` I updated the following Catalog types: * `GidMappingValue`: replaced the `id` field with `catalog_id` and `global_id`, used to identify builtin catalog objects. * `ClusterIntrospectionSourceIndexValue`: replaced the `index_id` field with `catalog_id` and `global_id`, used to identify builtin introspection source indexes. * `CommentKey`: replaced `GlobalId` with `CatalogItemId`, used to identify comments on objects. * `SourceReferencesKey`: replaced `GlobalId` with `CatalogItemId`, used to track references between a Source and the subsources/tables that read from it. #### Partial Progress Today `CatalogItemId` is 1:1 with `GlobalId`, this allows us to implement the `to_item_id` and `to_global_id` shim methods. Until we support `ALTER TABLE ... ADD COLUMN ...` we can freely convert between the two. This allows us to break this change up among multiple PRs instead of a single massive change. #### Initial Migration Because `CatalogItemId` and `GlobalId` are currently 1:1, this allows us to migrate the raw values of the IDs, e.g. `GlobalId::User(42)` becomes `CatalogItemId::User(42)`, which is exactly what this PR does. ### Motivation Progress towards https://github.com/MaterializeInc/database-issues/issues/8233 Implements changes described in #30019 ### Tips for reviewer This PR is split into two commits: 1. The durable catalog migration, and updates to durable Catalog objects. I would appreciate the most thorough reviews on this commit. 4. Shimming all calling code to convert between `CatalogItemId` and `GlobalId`. ### Checklist - [ ] This PR has adequate test coverage / QA involvement has been duly considered. ([trigger-ci for additional test/nightly runs](https://trigger-ci.dev.materialize.com/)) - [ ] This PR has an associated up-to-date [design doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md), is a design doc ([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)), or is sufficiently small to not require a design. <!-- Reference the design in the description. --> - [ ] If this PR evolves [an existing `$T ⇔ Proto$T` mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md) (possibly in a backwards-incompatible way), then it is tagged with a `T-proto` label. - [ ] If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label ([example](https://github.com/MaterializeInc/cloud/pull/5021)). <!-- Ask in #team-cloud on Slack if you need help preparing the cloud PR. --> - [ ] If this PR includes major [user-facing behavior changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note), I have pinged the relevant PM to schedule a changelog post.
1 parent 4d848ec commit 1945b14

24 files changed

Lines changed: 1940 additions & 201 deletions

File tree

src/adapter/src/catalog/apply.rs

Lines changed: 37 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -516,7 +516,7 @@ impl CatalogState {
516516
diff: StateDiff,
517517
retractions: &mut InProgressRetractions,
518518
) {
519-
let id = system_object_mapping.unique_identifier.id;
519+
let id = system_object_mapping.unique_identifier.global_id;
520520

521521
if system_object_mapping.unique_identifier.runtime_alterable() {
522522
// Runtime-alterable system objects have real entries in the items
@@ -871,13 +871,15 @@ impl CatalogState {
871871
StateDiff::Addition => {
872872
let key = item.key();
873873
let mz_catalog::durable::Item {
874-
id,
874+
id: _,
875875
oid,
876876
schema_id,
877877
name,
878878
create_sql,
879879
owner_id,
880880
privileges,
881+
global_id,
882+
extra_versions: _,
881883
} = item;
882884
let schema = self.find_non_temp_schema(&schema_id);
883885
let name = QualifiedItemName {
@@ -889,18 +891,23 @@ impl CatalogState {
889891
};
890892
let entry = match retractions.items.remove(&key) {
891893
Some(mut retraction) => {
892-
assert_eq!(retraction.id, item.id);
894+
// TODO(alter_table): Switch this to CatalogItemId.
895+
assert_eq!(retraction.id, item.global_id);
893896
// We only reparse the SQL if it's changed. Otherwise, we use the existing
894897
// item. This is a performance optimization and not needed for correctness.
895898
// This makes it difficult to use the `UpdateFrom` trait, but the structure
896899
// is still the same as the trait.
900+
//
901+
// TODO(alter_table): Switch this to CatalogItemId.
897902
if retraction.create_sql() != create_sql {
898-
let item = self.deserialize_item(id, &create_sql).unwrap_or_else(|e| {
899-
panic!("{e:?}: invalid persisted SQL: {create_sql}")
900-
});
903+
let item = self
904+
.deserialize_item(global_id, &create_sql)
905+
.unwrap_or_else(|e| {
906+
panic!("{e:?}: invalid persisted SQL: {create_sql}")
907+
});
901908
retraction.item = item;
902909
}
903-
retraction.id = id;
910+
retraction.id = global_id;
904911
retraction.oid = oid;
905912
retraction.name = name;
906913
retraction.owner_id = owner_id;
@@ -909,15 +916,17 @@ impl CatalogState {
909916
retraction
910917
}
911918
None => {
912-
let catalog_item =
913-
self.deserialize_item(id, &create_sql).unwrap_or_else(|e| {
919+
// TODO(alter_table): Switch this to CatalogItemId.
920+
let catalog_item = self
921+
.deserialize_item(global_id, &create_sql)
922+
.unwrap_or_else(|e| {
914923
panic!("{e:?}: invalid persisted SQL: {create_sql}")
915924
});
916925
CatalogEntry {
917926
item: catalog_item,
918927
referenced_by: Vec::new(),
919928
used_by: Vec::new(),
920-
id,
929+
id: global_id,
921930
oid,
922931
name,
923932
owner_id,
@@ -939,7 +948,8 @@ impl CatalogState {
939948
self.insert_entry(entry);
940949
}
941950
StateDiff::Retraction => {
942-
let entry = self.drop_item(item.id);
951+
// TODO(alter_table): Switch this to CatalogItemId.
952+
let entry = self.drop_item(item.global_id);
943953
let key = item.into_key_value().0;
944954
retractions.items.insert(key, entry);
945955
}
@@ -990,16 +1000,19 @@ impl CatalogState {
9901000
) {
9911001
match diff {
9921002
StateDiff::Addition => {
993-
let prev = self
994-
.source_references
995-
.insert(source_references.source_id, source_references.into());
1003+
let prev = self.source_references.insert(
1004+
source_references.source_id.to_global_id(),
1005+
source_references.into(),
1006+
);
9961007
assert!(
9971008
prev.is_none(),
9981009
"values must be explicitly retracted before inserting a new value: {prev:?}"
9991010
);
10001011
}
10011012
StateDiff::Retraction => {
1002-
let prev = self.source_references.remove(&source_references.source_id);
1013+
let prev = self
1014+
.source_references
1015+
.remove(&source_references.source_id.to_global_id());
10031016
assert!(
10041017
prev.is_some(),
10051018
"retraction for a non-existent existing value: {source_references:?}"
@@ -1121,13 +1134,13 @@ impl CatalogState {
11211134
// items collection and so get handled through the normal
11221135
// `StateUpdateKind::Item`.`
11231136
if !system_object_mapping.unique_identifier.runtime_alterable() {
1124-
self.pack_item_update(system_object_mapping.unique_identifier.id, diff)
1137+
self.pack_item_update(system_object_mapping.unique_identifier.global_id, diff)
11251138
} else {
11261139
vec![]
11271140
}
11281141
}
11291142
StateUpdateKind::TemporaryItem(item) => self.pack_item_update(item.id, diff),
1130-
StateUpdateKind::Item(item) => self.pack_item_update(item.id, diff),
1143+
StateUpdateKind::Item(item) => self.pack_item_update(item.global_id, diff),
11311144
StateUpdateKind::Comment(comment) => vec![self.pack_comment_update(
11321145
comment.object_id,
11331146
comment.sub_component,
@@ -1720,7 +1733,8 @@ fn sort_updates_inner(updates: Vec<StateUpdate>) -> Vec<StateUpdate> {
17201733
if item.create_sql.starts_with("CREATE SINK") {
17211734
GlobalId::User(u64::MAX)
17221735
} else {
1723-
item.id
1736+
// TODO(alter_table): Switch this to CatalogItemId.
1737+
item.global_id
17241738
}
17251739
})
17261740
.collect()
@@ -1757,14 +1771,16 @@ fn sort_updates_inner(updates: Vec<StateUpdate>) -> Vec<StateUpdate> {
17571771
while let (Some((item, _, _)), Some((temp_item, _, _))) =
17581772
(item_updates.front(), temp_item_updates.front())
17591773
{
1760-
if item.id < temp_item.id {
1774+
// TODO(alter_table): Switch this to CatalogItemId.
1775+
if item.global_id < temp_item.id {
17611776
let (item, ts, diff) = item_updates.pop_front().expect("non-empty");
17621777
state_updates.push(StateUpdate {
17631778
kind: StateUpdateKind::Item(item),
17641779
ts,
17651780
diff,
17661781
});
1767-
} else if item.id > temp_item.id {
1782+
// TODO(alter_table): Switch this to CatalogItemId.
1783+
} else if item.global_id > temp_item.id {
17681784
let (temp_item, ts, diff) = temp_item_updates.pop_front().expect("non-empty");
17691785
state_updates.push(StateUpdate {
17701786
kind: StateUpdateKind::TemporaryItem(temp_item),
@@ -2009,5 +2025,5 @@ fn lookup_builtin_view_addition(
20092025
let (_, builtin) = BUILTIN_LOOKUP
20102026
.get(&system_object_mapping.description)
20112027
.expect("missing builtin view");
2012-
(*builtin, system_object_mapping.unique_identifier.id)
2028+
(*builtin, system_object_mapping.unique_identifier.global_id)
20132029
}

src/adapter/src/catalog/consistency.rs

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -234,18 +234,18 @@ impl CatalogState {
234234
let mut comment_inconsistencies = Vec::new();
235235
for (comment_object_id, col_pos, _comment) in self.comments.iter() {
236236
match comment_object_id {
237-
CommentObjectId::Table(global_id)
238-
| CommentObjectId::View(global_id)
239-
| CommentObjectId::MaterializedView(global_id)
240-
| CommentObjectId::Source(global_id)
241-
| CommentObjectId::Sink(global_id)
242-
| CommentObjectId::Index(global_id)
243-
| CommentObjectId::Func(global_id)
244-
| CommentObjectId::Connection(global_id)
245-
| CommentObjectId::Type(global_id)
246-
| CommentObjectId::Secret(global_id)
247-
| CommentObjectId::ContinualTask(global_id) => {
248-
let entry = self.entry_by_id.get(&global_id);
237+
CommentObjectId::Table(item_id)
238+
| CommentObjectId::View(item_id)
239+
| CommentObjectId::MaterializedView(item_id)
240+
| CommentObjectId::Source(item_id)
241+
| CommentObjectId::Sink(item_id)
242+
| CommentObjectId::Index(item_id)
243+
| CommentObjectId::Func(item_id)
244+
| CommentObjectId::Connection(item_id)
245+
| CommentObjectId::Type(item_id)
246+
| CommentObjectId::Secret(item_id)
247+
| CommentObjectId::ContinualTask(item_id) => {
248+
let entry = self.entry_by_id.get(&item_id.to_global_id());
249249
match entry {
250250
None => comment_inconsistencies
251251
.push(CommentInconsistency::Dangling(comment_object_id)),

src/adapter/src/catalog/migrate.rs

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -36,11 +36,12 @@ where
3636

3737
for mut item in tx.get_items() {
3838
let mut stmt = mz_sql::parse::parse(&item.create_sql)?.into_element().ast;
39-
f(tx, item.id, &mut stmt).await?;
39+
// TODO(alter_table): Switch this to CatalogItemId.
40+
f(tx, item.global_id, &mut stmt).await?;
4041

4142
item.create_sql = stmt.to_ast_string_stable();
4243

43-
updated_items.insert(item.id, item);
44+
updated_items.insert(item.global_id, item);
4445
}
4546
tx.update_items(updated_items)?;
4647
Ok(())
@@ -63,12 +64,12 @@ where
6364
let items = tx.get_items();
6465
for mut item in items {
6566
let mut stmt = mz_sql::parse::parse(&item.create_sql)?.into_element().ast;
66-
67-
f(tx, &cat, item.id, &mut stmt).await?;
67+
// TODO(alter_table): Switch this to CatalogItemId.
68+
f(tx, &cat, item.global_id, &mut stmt).await?;
6869

6970
item.create_sql = stmt.to_ast_string_stable();
7071

71-
updated_items.insert(item.id, item);
72+
updated_items.insert(item.global_id, item);
7273
}
7374
tx.update_items(updated_items)?;
7475
Ok(())

src/adapter/src/catalog/open.rs

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -664,7 +664,8 @@ impl Catalog {
664664
object_name: entry.name().item.clone(),
665665
},
666666
unique_identifier: SystemObjectUniqueIdentifier {
667-
id: new_id,
667+
catalog_id: new_id.to_item_id(),
668+
global_id: new_id,
668669
fingerprint: fingerprint.clone(),
669670
},
670671
},
@@ -949,7 +950,7 @@ fn add_new_remove_old_builtin_items_migration(
949950
!builtin.runtime_alterable(),
950951
"setting the runtime alterable flag on an existing object is not permitted"
951952
);
952-
migrated_builtin_ids.push(system_object_mapping.unique_identifier.id);
953+
migrated_builtin_ids.push(system_object_mapping.unique_identifier.global_id);
953954
}
954955
}
955956

@@ -961,7 +962,11 @@ fn add_new_remove_old_builtin_items_migration(
961962
object_type: builtin.catalog_item_type(),
962963
object_name: builtin.name().to_string(),
963964
},
964-
unique_identifier: SystemObjectUniqueIdentifier { id, fingerprint },
965+
unique_identifier: SystemObjectUniqueIdentifier {
966+
catalog_id: id.to_item_id(),
967+
global_id: id,
968+
fingerprint,
969+
},
965970
});
966971

967972
// Runtime-alterable system objects are durably recorded to the
@@ -1004,7 +1009,7 @@ fn add_new_remove_old_builtin_items_migration(
10041009
for (_, mapping) in system_object_mappings {
10051010
deleted_system_objects.insert(mapping.description);
10061011
if mapping.unique_identifier.fingerprint == RUNTIME_ALTERABLE_FINGERPRINT_SENTINEL {
1007-
deleted_runtime_alterable_system_ids.insert(mapping.unique_identifier.id);
1012+
deleted_runtime_alterable_system_ids.insert(mapping.unique_identifier.global_id);
10081013
}
10091014
}
10101015
// If you are 100% positive that it is safe to delete a system object outside any of the

src/adapter/src/catalog/open/builtin_item_migration.rs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -211,7 +211,8 @@ async fn migrate_builtin_items_0dt(
211211
object_name: entry.name().item.clone(),
212212
},
213213
unique_identifier: SystemObjectUniqueIdentifier {
214-
id: *id,
214+
catalog_id: id.to_item_id(),
215+
global_id: *id,
215216
fingerprint: fingerprint.clone(),
216217
},
217218
},

src/adapter/src/catalog/state.rs

Lines changed: 14 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1742,20 +1742,19 @@ impl CatalogState {
17421742
match object_id {
17431743
ObjectId::Item(global_id) => {
17441744
let entry = self.get_entry(&global_id);
1745+
let item_id = global_id.to_item_id();
17451746
match entry.item_type() {
1746-
CatalogItemType::Table => CommentObjectId::Table(global_id),
1747-
CatalogItemType::Source => CommentObjectId::Source(global_id),
1748-
CatalogItemType::Sink => CommentObjectId::Sink(global_id),
1749-
CatalogItemType::View => CommentObjectId::View(global_id),
1750-
CatalogItemType::MaterializedView => {
1751-
CommentObjectId::MaterializedView(global_id)
1752-
}
1753-
CatalogItemType::Index => CommentObjectId::Index(global_id),
1754-
CatalogItemType::Func => CommentObjectId::Func(global_id),
1755-
CatalogItemType::Connection => CommentObjectId::Connection(global_id),
1756-
CatalogItemType::Type => CommentObjectId::Type(global_id),
1757-
CatalogItemType::Secret => CommentObjectId::Secret(global_id),
1758-
CatalogItemType::ContinualTask => CommentObjectId::ContinualTask(global_id),
1747+
CatalogItemType::Table => CommentObjectId::Table(item_id),
1748+
CatalogItemType::Source => CommentObjectId::Source(item_id),
1749+
CatalogItemType::Sink => CommentObjectId::Sink(item_id),
1750+
CatalogItemType::View => CommentObjectId::View(item_id),
1751+
CatalogItemType::MaterializedView => CommentObjectId::MaterializedView(item_id),
1752+
CatalogItemType::Index => CommentObjectId::Index(item_id),
1753+
CatalogItemType::Func => CommentObjectId::Func(item_id),
1754+
CatalogItemType::Connection => CommentObjectId::Connection(item_id),
1755+
CatalogItemType::Type => CommentObjectId::Type(item_id),
1756+
CatalogItemType::Secret => CommentObjectId::Secret(item_id),
1757+
CatalogItemType::ContinualTask => CommentObjectId::ContinualTask(item_id),
17591758
}
17601759
}
17611760
ObjectId::Role(role_id) => CommentObjectId::Role(role_id),
@@ -2135,7 +2134,7 @@ impl CatalogState {
21352134
| CommentObjectId::Connection(id)
21362135
| CommentObjectId::Type(id)
21372136
| CommentObjectId::Secret(id)
2138-
| CommentObjectId::ContinualTask(id) => Some(*id),
2137+
| CommentObjectId::ContinualTask(id) => Some(id.to_global_id()),
21392138
CommentObjectId::Role(_)
21402139
| CommentObjectId::Database(_)
21412140
| CommentObjectId::Schema(_)
@@ -2165,7 +2164,7 @@ impl CatalogState {
21652164
| CommentObjectId::Type(id)
21662165
| CommentObjectId::Secret(id)
21672166
| CommentObjectId::ContinualTask(id) => {
2168-
let item = self.get_entry(&id);
2167+
let item = self.get_entry(&id.to_global_id());
21692168
let name = self.resolve_full_name(item.name(), Some(conn_id));
21702169
name.to_string()
21712170
}

src/buf.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ breaking:
3434
# reason: does currently not require backward-compatibility
3535
- catalog/protos/objects_v67.proto
3636
# reason: does currently not require backward-compatibility
37+
- catalog/protos/objects_v68.proto
38+
# reason: does currently not require backward-compatibility
3739
- cluster-client/src/client.proto
3840
# reason: does currently not require backward-compatibility
3941
- compute-client/src/logging.proto

src/catalog/build.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,7 @@ fn main() -> anyhow::Result<()> {
148148
.enum_attribute("CatalogItem.value", ATTR)
149149
.enum_attribute("ClusterConfig.variant", ATTR)
150150
.enum_attribute("GlobalId.value", ATTR)
151+
.enum_attribute("CatalogItemId.value", ATTR)
151152
.enum_attribute("ClusterId.value", ATTR)
152153
.enum_attribute("DatabaseId.value", ATTR)
153154
.enum_attribute("SchemaId.value", ATTR)

src/catalog/protos/hashes.json

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[
22
{
33
"name": "objects.proto",
4-
"md5": "64474ac4b2cf2f9aca7a0d772c4548e6"
4+
"md5": "b023c4e7ca71ae263d80a609dd586c72"
55
},
66
{
77
"name": "objects_v60.proto",
@@ -34,5 +34,9 @@
3434
{
3535
"name": "objects_v67.proto",
3636
"md5": "ce8acf8bc724dc3121e3014555f00250"
37+
},
38+
{
39+
"name": "objects_v68.proto",
40+
"md5": "f9ae9b93103620bce86b07c0a35b9c6d"
3741
}
3842
]

0 commit comments

Comments
 (0)