-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New dataset databricks.ExternalTableDataset #349
Comments
Hi @KrzysztofDoboszInpost, thanks for opening this issue. Do you want to take over #251? Checking out the branch and opening a new PR should suffice. |
Sure, as soon as I'll be able to :) |
This deserves some investigation indeed :) Let's continue the discussion here until we're clear on the path forward. I'll add this to our backlog. |
An experimental |
Description
Already existing dataset databricks.ManagedTableDataset doesn't allow to specify the location of the stored files, which in some setups is crucial. There's already PR #251 for it, but it seems to be stale.
Context
I develop a number of kedro projects that are deployed to Databricks. Having a single dataset that handles both pandas and spark DFs, and can write into (and read from) DBX database would be a lifesaver, as long as I could specify the path.
Possible Implementation
In spark, it suffices to add
path
option to make table external. I'm not sure if it would be as simple here though.Possible Alternatives
Adding an argument to ManagedTableDataset is also an option, but then the table wouldn't really be Managed - it might cause some confusion
The text was updated successfully, but these errors were encountered: