-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FileScanConfigBuilder
#15352
base: main
Are you sure you want to change the base?
Add FileScanConfigBuilder
#15352
Conversation
FileScanConfig::new(object_store_url, self.schema(), source) | ||
FileScanConfigBuilder::new(object_store_url, self.schema(), source) | ||
.with_projection(projection.cloned()) | ||
.with_limit(limit); | ||
.with_limit(limit) | ||
.build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mertak-synnada, hey, this PR is still WIP but I was wondering if you're happy with this approach. That's what we've discussed in #14685 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good so far, thank you! I haven't been able to test it thoroughly yet, but the legacy ParquetExecBuilder
could be helpful for understanding specific cases, just FYI.
FileScanConfigBuilder
and switch some casesFileScanConfigBuilder
also fyi @AdamGS 👀 |
|
||
// Finally, put it all together into a DataSourceExec | ||
Ok(file_scan_config.build()) | ||
Ok(Arc::new(DataSourceExec::new(Arc::new(file_scan_config)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it possible to have this return as a function? Is it because of import cycles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean as aDataSourceExec
function? It also looks a bit verbose to me, but the inner Arc is needed for dynamic dispatch, and the outer one makes the return type more explicit. Happy to make it DataSourceExec::new_arc
if you want, but i don't think we use that a lot in datafusion
my 2c - this looks great, I would love to also rename |
Related to #14685 (comment)
Rationale for this change
FileScanConfig
now violates single responsibility from SOLID. It serves two conflicting roles:As a builder, though this should be changed as discussed in
datafusion/datafusion/datasource/src/file_scan_config.rs
Line 631 in 635e73b
As a business logic provider (e.g.,
fn project
,impl DataSource
, etc.)These conflicting roles lead to issues like #14905 and #14679, where provider features are accessed even before the build process is complete.
What changes are included in this PR?
I've added
FileScanConfigBuilder
and deprecated builder approach forFileScanConfig
Are these changes tested?
Yes, updated exiting tests into the new interface
Are there any user-facing changes?
Yes, new builder interface - but the switch is quite easy (all builder-methods from
FileScanConfig
are supported)