-
Notifications
You must be signed in to change notification settings - Fork 464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage/adapter: Opt-in migration of sources to the new table model #30168
Conversation
0d33ba1
to
0a8d418
Compare
0a8d418
to
d6b2910
Compare
$ kafka-ingest format=avro key-format=avro topic=upsert-legacy-syntax key-schema=${keyschema} schema=${schema} repeat=10000 | ||
{"key1": "A${kafka-ingest.iteration}"} {"f1": "A${kafka-ingest.iteration}"} | ||
|
||
> CREATE SOURCE upsert_insert |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we will need to use more different variants of the syntax here. I will add some more.
I guess we could just hard-code the version returned in |
How would we realize that a migration did not work? Do we need to query a source for that or would the failure already occur during startup? We kept old testdrive files before the test migration. I could write a test setup that runs those files with an older version and migrates to a newer version. What do you think? |
Something like this: d5dd0c6 @def-, what do I need to change to keep the data from the previous mz instance? When I start the new instance, it is empty. |
If the migration fails then the
for example |
Testdrive has a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't finish reviewing, but have to step away for a sec. Here's some initial comments/questions.
I used |
What is the recommended way to ensure that all sources were migrated; that is that no unmigrated sources are still around? Go through sources and use |
Then start Materialize with |
Yes @nrainer-materialize that's what we need to do in the workflow you added and would have caught this bug that Joe spotted: #30168 (comment) |
d5dd0c6
to
6ecb9ff
Compare
if let RawItemName::Id(id, _, _) = item_name { | ||
let parsed_id = id.parse::<GlobalId>().unwrap(); | ||
if let Some(new_id) = self.id_map.get(&parsed_id) { | ||
*id = new_id.to_string(); | ||
self.modified = true; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we encounter a RawItemName::Name
? Is that unexpected? If so, we should add an else branch with a panic/assert/unreachable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought was that if it was a RawItemName::Name
then there is no contained id to update. This method is intending only to update ids and assumes the caller takes care of enforcing the name is copied from the item with the current-global-id to the new one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok I see, the name will resolve to the new ID when it's looked up. That LGTM then.
When adding my test, I am running into this issue:
Any idea what this is? Could this be a migration problem? |
@nrainer-materialize that seems like a persist issue with using |
I figured it out together with Dennis. The issue was that I was going from v0.122.0 to v0.122.0-dev. |
I continued working on |
9f4276f
to
d4aa1bd
Compare
…sorting of item updates
…rce-table-migration # Conflicts: # src/adapter/src/catalog/apply.rs # src/adapter/src/catalog/migrate.rs # src/adapter/src/catalog/open.rs # src/catalog/src/durable/transaction.rs # src/sql/src/names.rs # test/testdrive-old-kafka-src-syntax/mzcompose.py
I'm switching to a new PR from my own fork: #30483. I can't get CI to build this since the author is no longer at the company. |
def check_source_table_migration_test_sensible() -> None: | ||
assert MzVersion.parse_cargo() < MzVersion.parse_mz( | ||
"v0.130.0" | ||
), "migration test probably no longer needed" | ||
|
||
|
||
def get_old_image_for_source_table_migration_test() -> str: | ||
return "materialize/materialized:v0.122.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@def- Any chance that you know what the significance of these two versions are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably just supposed to be an old version to upgrade from, I don't think it actually matters.
Motivation
The subsequent PR will implement https://github.com/MaterializeInc/database-issues/issues/8678, which will also disable use of the 'old style' source statements using the same feature-flag introduced here. Once this PR and that PR land, then enabling the
force_source_table_syntax
flag will completely switch over users to the new syntax.Tips for reviewer
To test this I've added a new scenario to
platform-checks
calledActivateSourceVersioningMigration
, that runs materialize on an existing version for each check'sinitialize()
method, and then restarts materialize on the latest version with theforce_source_table_syntax
, activating the migration of any sources created using the 'old style' syntax. Then thevalidate()
step is run on this new version, confirming that all the queries continue to work.There are already existing
platform-checks
Checks
that use the 'old style' source syntax:TableFromPgSource
,TableFromMySqlSource
,LoadGeneratorAsOfUpTo
, and one I added calledUpsertLegacy
, that cover the 4 source types we need to test. There are also many other checks that use the old syntax when running on 'old' versions before 0.119, but I wasn't sure how to make theActivateSourceVersioningMigration
scenario target a specific version rather than just the 'previous' version for the base run. @def- @nrainer-materialize let me know if you have ideas on doing that.I've also updated the
legacy upgrade tests
to activate this migration after the upgrade which should provide additional coverage too.Nightly
https://buildkite.com/materialize/nightly/builds?branch=rjobanp%3Asource-table-migration
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.