Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hooks to SchemaAdapter to add custom column generators #15261

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

adriangb
Copy link
Contributor

@adriangb adriangb commented Mar 16, 2025

Closes #15220

A lot of the work of this PR is meant to resolve #15220 (comment). I think I'll move that into a standalone PR.

@github-actions github-actions bot added core Core DataFusion crate datasource Changes to the datasource crate labels Mar 16, 2025
@adriangb adriangb marked this pull request as draft March 16, 2025 20:48
Comment on lines 279 to 280
let err = adapter.map_schema(&file_schema).unwrap_err().to_string();
assert_eq!(err, "Error during planning: Column a is missing from the file schema, cannot be generated, and is non-nullable");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice that this fails earlier 😄

@adriangb
Copy link
Contributor Author

I've moved the complex bit over to #15263. I'll let that settle first then resume work here.

@adriangb
Copy link
Contributor Author

Noting that in #15263 (comment) I realized that it might be good to have a system to report stats for columns that will be generated before they are generated (is it all nulls? is it a constant?) to be used with stats pruning.

@adriangb
Copy link
Contributor Author

Now that #15263 is merged I'll come back here and:

  1. Resolve conflicts.
  2. Add an API for the SchemaAdapter to declare the stats for potentially generated columns if they are known ahead of time.

@adriangb adriangb force-pushed the defaults-schema-adapter branch from 896eb8b to 74b884f Compare March 20, 2025 17:15
@adriangb adriangb force-pushed the defaults-schema-adapter branch from de02a9a to 0bdebc8 Compare March 20, 2025 17:20
@adriangb adriangb marked this pull request as ready for review March 20, 2025 17:20
@adriangb
Copy link
Contributor Author

Marking as ready for review. The main TODO is an API for transmitting statistics information for generated columns before they get generated, but that can even be a followup PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate datasource Changes to the datasource crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support default values for columns in SchemaAdapter
1 participant