Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage/adapter: Opt-in migration of sources to the new table model #30168

Closed
wants to merge 26 commits into from

Conversation

rjobanp
Copy link
Contributor

@rjobanp rjobanp commented Oct 23, 2024

Motivation

The subsequent PR will implement https://github.com/MaterializeInc/database-issues/issues/8678, which will also disable use of the 'old style' source statements using the same feature-flag introduced here. Once this PR and that PR land, then enabling the force_source_table_syntax flag will completely switch over users to the new syntax.

Tips for reviewer

To test this I've added a new scenario to platform-checks called ActivateSourceVersioningMigration, that runs materialize on an existing version for each check's initialize() method, and then restarts materialize on the latest version with the force_source_table_syntax, activating the migration of any sources created using the 'old style' syntax. Then the validate() step is run on this new version, confirming that all the queries continue to work.

There are already existing platform-checks Checks that use the 'old style' source syntax: TableFromPgSource, TableFromMySqlSource, LoadGeneratorAsOfUpTo, and one I added called UpsertLegacy, that cover the 4 source types we need to test. There are also many other checks that use the old syntax when running on 'old' versions before 0.119, but I wasn't sure how to make the ActivateSourceVersioningMigration scenario target a specific version rather than just the 'previous' version for the base run. @def- @nrainer-materialize let me know if you have ideas on doing that.

I've also updated the legacy upgrade tests to activate this migration after the upgrade which should provide additional coverage too.

Nightly

https://buildkite.com/materialize/nightly/builds?branch=rjobanp%3Asource-table-migration

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@rjobanp rjobanp force-pushed the source-table-migration branch 3 times, most recently from 0d33ba1 to 0a8d418 Compare October 24, 2024 21:15
@rjobanp rjobanp marked this pull request as ready for review October 24, 2024 21:22
@rjobanp rjobanp requested review from a team as code owners October 24, 2024 21:22
@rjobanp rjobanp force-pushed the source-table-migration branch from 0a8d418 to d6b2910 Compare October 24, 2024 21:23
$ kafka-ingest format=avro key-format=avro topic=upsert-legacy-syntax key-schema=${keyschema} schema=${schema} repeat=10000
{"key1": "A${kafka-ingest.iteration}"} {"f1": "A${kafka-ingest.iteration}"}

> CREATE SOURCE upsert_insert
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will need to use more different variants of the syntax here. I will add some more.

@nrainer-materialize
Copy link
Contributor

I wasn't sure how to make the ActivateSourceVersioningMigration scenario target a specific version rather than just the 'previous' version for the base run.

I guess we could just hard-code the version returned in ActivateSourceVersioningMigration's def base_version(self) -> MzVersion: to a version before the migration.

@nrainer-materialize
Copy link
Contributor

How would we realize that a migration did not work? Do we need to query a source for that or would the failure already occur during startup?

We kept old testdrive files before the test migration. I could write a test setup that runs those files with an older version and migrates to a newer version. What do you think?

@nrainer-materialize
Copy link
Contributor

We kept old testdrive files before the test migration. I could write a test setup that runs those files with an older version and migrates to a newer version. What do you think?

Something like this: d5dd0c6

@def-, what do I need to change to keep the data from the previous mz instance? When I start the new instance, it is empty.

@rjobanp
Copy link
Contributor Author

rjobanp commented Oct 25, 2024

How would we realize that a migration did not work? Do we need to query a source for that or would the failure already occur during startup?

If the migration fails then the ActivateSourceVersioningMigration scenario will fail to start MZ before the validate() step for any check. This is how I tested the migration locally - by running:

bin/mzcompose --find platform-checks down && bin/mzcompose --find platform-checks --dev run default  --scenario=ActivateSourceVersioningMigration --check=TableFromPgSource --check=LoadGeneratorAsOfUpTo --check=UpsertLegacy

for example

@def-
Copy link
Contributor

def- commented Oct 25, 2024

@def-, what do I need to change to keep the data from the previous mz instance? When I start the new instance, it is empty.

Testdrive has a no_reset=True parameter for that.

Copy link
Contributor

@jkosh44 jkosh44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't finish reviewing, but have to step away for a sec. Here's some initial comments/questions.

src/adapter/src/catalog/migrate.rs Outdated Show resolved Hide resolved
src/adapter/src/catalog/migrate.rs Outdated Show resolved Hide resolved
src/adapter/src/catalog/migrate.rs Outdated Show resolved Hide resolved
@nrainer-materialize
Copy link
Contributor

@def-, what do I need to change to keep the data from the previous mz instance? When I start the new instance, it is empty.

Testdrive has a no_reset=True parameter for that.

I used "--no-reset".

@nrainer-materialize
Copy link
Contributor

If the migration fails then the ActivateSourceVersioningMigration scenario will fail to start MZ before the validate() step for any check. This is how I tested the migration locally - by running:

What is the recommended way to ensure that all sources were migrated; that is that no unmigrated sources are still around? Go through sources and use SHOW CREATE SOURCE?

@def-
Copy link
Contributor

def- commented Oct 25, 2024

I used "--no-reset".

Then start Materialize with external_metadata_store=True.

@rjobanp
Copy link
Contributor Author

rjobanp commented Oct 25, 2024

What is the recommended way to ensure that all sources were migrated; that is that no unmigrated sources are still around? Go through sources and use SHOW CREATE SOURCE?

Yes @nrainer-materialize that's what we need to do in the workflow you added and would have caught this bug that Joe spotted: #30168 (comment)

src/adapter/src/catalog/migrate.rs Outdated Show resolved Hide resolved
src/adapter/src/catalog/migrate.rs Outdated Show resolved Hide resolved
src/adapter/src/catalog/migrate.rs Outdated Show resolved Hide resolved
Comment on lines +2182 to +2196
if let RawItemName::Id(id, _, _) = item_name {
let parsed_id = id.parse::<GlobalId>().unwrap();
if let Some(new_id) = self.id_map.get(&parsed_id) {
*id = new_id.to_string();
self.modified = true;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we encounter a RawItemName::Name? Is that unexpected? If so, we should add an else branch with a panic/assert/unreachable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was that if it was a RawItemName::Name then there is no contained id to update. This method is intending only to update ids and assumes the caller takes care of enforcing the name is copied from the item with the current-global-id to the new one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok I see, the name will resolve to the new ID when it's looked up. That LGTM then.

@nrainer-materialize
Copy link
Contributor

When adding my test, I am running into this issue:

postgres-metadata-1  | 2024-10-25 15:31:28.207 UTC [266] ERROR:  could not serialize access due to read/write dependencies among transactions
postgres-metadata-1  | 2024-10-25 15:31:28.207 UTC [266] DETAIL:  Reason code: Canceled on identification as a pivot, during write.
postgres-metadata-1  | 2024-10-25 15:31:28.207 UTC [266] HINT:  The transaction might succeed if retried.
postgres-metadata-1  | 2024-10-25 15:31:28.207 UTC [266] STATEMENT:  
postgres-metadata-1  | 	                INSERT INTO consensus (shard, sequence_number, data)
postgres-metadata-1  | 	                SELECT $1, $2, $3
postgres-metadata-1  | 	                WHERE (SELECT sequence_number FROM consensus
postgres-metadata-1  | 	                       WHERE shard = $1
postgres-metadata-1  | 	                       ORDER BY sequence_number DESC LIMIT 1) = $4;

Any idea what this is? Could this be a migration problem?

@rjobanp
Copy link
Contributor Author

rjobanp commented Oct 25, 2024

@nrainer-materialize that seems like a persist issue with using metadata_store="postgres-metadata", instead of the default cockroach metadata store. Can you remove that argument for the Materialize instances in your new test? As far as I know that is not currently stable

@nrainer-materialize
Copy link
Contributor

@nrainer-materialize that seems like a persist issue with using metadata_store="postgres-metadata", instead of the default cockroach metadata store. Can you remove that argument for the Materialize instances in your new test? As far as I know that is not currently stable

I figured it out together with Dennis. The issue was that I was going from v0.122.0 to v0.122.0-dev.

@nrainer-materialize
Copy link
Contributor

I continued working on pg-cdc-migration and triggered https://buildkite.com/materialize/nightly/builds/10160.
I will continue on Monday morning with MySQL and Kafka if the approach works.

@rjobanp rjobanp force-pushed the source-table-migration branch from 9f4276f to d4aa1bd Compare October 29, 2024 19:20
rjobanp and others added 3 commits October 29, 2024 17:54
…rce-table-migration

# Conflicts:
#	src/adapter/src/catalog/apply.rs
#	src/adapter/src/catalog/migrate.rs
#	src/adapter/src/catalog/open.rs
#	src/catalog/src/durable/transaction.rs
#	src/sql/src/names.rs
#	test/testdrive-old-kafka-src-syntax/mzcompose.py
@jkosh44
Copy link
Contributor

jkosh44 commented Nov 14, 2024

I'm switching to a new PR from my own fork: #30483. I can't get CI to build this since the author is no longer at the company.

Comment on lines +60 to +67
def check_source_table_migration_test_sensible() -> None:
assert MzVersion.parse_cargo() < MzVersion.parse_mz(
"v0.130.0"
), "migration test probably no longer needed"


def get_old_image_for_source_table_migration_test() -> str:
return "materialize/materialized:v0.122.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@def- Any chance that you know what the significance of these two versions are?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably just supposed to be an old version to upgrade from, I don't think it actually matters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants