Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source-postgres: Feature flag use_schema_inference #2337

Merged
merged 2 commits into from
Feb 4, 2025

Conversation

willdonnelly
Copy link
Member

@willdonnelly willdonnelly commented Feb 4, 2025

Description:

This PR adds a feature flag use_schema_inference to source-postgres which when set causes x-infer-schema: true to be added to discovered schemas.

Actually figuring out the right spot to add the x-infer-schema annotation was surprisingly tricky, what I ended up doing was adding a third "partial schema" to the top-level allOf, where the new schema is literally just {"x-infer-schema": true}. I think that works?


This change is Reviewable

@jgraettinger
Copy link
Member

This actually doesn't work: our handling of x-infer-schema is simplistic and we only inspect it if it's a property of the schema root. Would it be easier to instead unconditionally include x-infer-schema, but vary whether it's value is true or false ?

@willdonnelly willdonnelly force-pushed the wgd/postgres-schema-inference-20250204 branch from 5690cd7 to 386562d Compare February 4, 2025 17:44
@willdonnelly
Copy link
Member Author

This actually doesn't work: our handling of x-infer-schema is simplistic and we only inspect it if it's a property of the schema root

Ah, I see. Okay, I've uploaded a new implementation then, which I think puts it in the right spot. The generated schema will now look like:

{
    "$defs": {"...definition of the actual table properties..."},
    "allOf": [
        {"...the usual stuff..."},
        {"...and the reference to the main table properties definition..."}
    ],
    "x-infer-schema": true
}

Does that seem correct to you? I opted to still only add it conditionally because it was about as easy to do it either way and I prefer knowing that the default schema generation is 100% unaltered by this PR.

Copy link
Member

@jgraettinger jgraettinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

The implementation of this flag is a little bit tricky for SQL
CDC connectors because the JSON schema generation is ultimately
handed off to the generic `sqlcapture` package, so I've opted to
add a property to the discovered tables indicating whether the
generated JSON schema ought to request schema inference.
@willdonnelly willdonnelly force-pushed the wgd/postgres-schema-inference-20250204 branch from 386562d to 042f769 Compare February 4, 2025 19:53
@willdonnelly willdonnelly merged commit b482c8a into main Feb 4, 2025
52 of 54 checks passed
@willdonnelly willdonnelly deleted the wgd/postgres-schema-inference-20250204 branch February 4, 2025 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants