Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

materialize-sql: validations and migrations for only-null columns #2345

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

williamhbaker
Copy link
Member

@williamhbaker williamhbaker commented Feb 5, 2025

Description:

WIP - The necessary plumbing is in place based on the struct updates from the updated flow pin and all existing tests pass. This can't be properly tested without a control plane build that populates write_inference though.

Handling for fields that have only null or no types, in both validate and apply.

Columns for these fields can be created based on a present write_inference. Schema inference may have only ever observed a null value, or have never observed a value at all, but we can still make a good guess on what kind of column to create based on the type from the write schema.

Similarly, we can always migrate these columns, since we know that they have only ever had null values. A possible scenario is this: The write schema has type: string, but the source field is almost always null (or not present). Eventually a value shows up, but it is inferred as a type: string, format: date-time. Although a TEXT column could not typically be migrated to a DATETIME column, we know all the existing values must be null so the column can be trivially migrating by dropping & re-creating it with the new type.

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

(anything that might help someone review this PR)


This change is Reviewable

Updates the flow repo pin to one that includes `write_inference` in the proto
structs.
…write_inference

Instead of forbidding fields that can only be `null` or have no types at all
(cannot exist), use the `write_inference` of the field to try to figure out what
kind of column needs to be created for the field.

The idea is that the write schema may contain sufficient information to
pre-create a reasonable column type, even if schema inference has not observed a
non-null value for the field yet.
…_inference

Since these kinds of fields will now be allowed, we need to be able to map them
to columns appropriately.
Migrating a field that is known to have only ever been a null value previously
is conceptually easy, because the migration only requires creating the new
column and doesn't need any casting of values into it.

Getting all this plumbed through our existing SQL migration framework is more
involved though, since it is very much geared toward doing actual casting and
migrations, which makes sense.

If there is a migration for a column that was previously only null, we will just
skip the step of casting values, but to keep things simple do the new column
creation / dropping the original / renaming the new like we do with all other
migrations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant