Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(datasets): share the cache of Ibis connections #941

Merged
merged 5 commits into from
Dec 4, 2024

Conversation

deepyaman
Copy link
Member

@deepyaman deepyaman commented Nov 25, 2024

Description

Close #935

Development notes

The ConnectionMixin can also be used by something like pandas.SQLQueryDataset and pandas.SQLTableDataset to share a single connection cache. While this has some value (prevent storing the connections twice, if the user is using both datasets), it's less functionally important as in this specific Ibis case.

Checklist

  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the relevant RELEASE.md file
  • Added tests to cover my changes
  • Received approvals from at least half of the TSC (required for adding a new, non-experimental dataset)

@deepyaman deepyaman force-pushed the fix/datasets/share-connection-mixin branch 4 times, most recently from d01ac44 to 2c54abc Compare November 25, 2024 05:48
@noklam noklam self-requested a review November 25, 2024 14:27
@merelcht merelcht mentioned this pull request Nov 25, 2024
7 tasks
@deepyaman deepyaman marked this pull request as ready for review November 26, 2024 01:29
@noklam
Copy link
Contributor

noklam commented Nov 26, 2024

I haven't looked into this at all, but my intuition is that it stems from #842 (comment); because the connections are different, temporary tables are not shared.

I can try to look into creating a shared cache, but no other dataset follows this pattern (they are pretty independent the way they're currently designed); not sure if @merelcht @astrojuanlu @idanov @noklam any of you may have thoughts on this. #935 (comment)

Quoting this for future reference. This is not just a refactoring, but necessary as temp table are only accessible by the same connection.

Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non blocking comment from the peanut gallery: Are there other ways of doing this without using Mixins?

I think we should do less inheritance, not more. Let alone multiple inheritance.

@deepyaman deepyaman force-pushed the fix/datasets/share-connection-mixin branch from 2448b80 to 05da4d6 Compare November 27, 2024 19:57
@deepyaman
Copy link
Member Author

deepyaman commented Nov 27, 2024

Non blocking comment from the peanut gallery: Are there other ways of doing this without using Mixins?

I think we should do less inheritance, not more. Let alone multiple inheritance.

I'm not 100% sure what the best alternative would be, and would need to do some reading/experimentation to figure it out. That said, I think this can definitely be refactored down the road, as it doesn't update the public interface.

Edit: https://python-patterns.guide/gang-of-four/composition-over-inheritance/ also has some ideas? But I haven't looked into it sufficiently to see what I may like.

@deepyaman
Copy link
Member Author

Blocked by #949

@deepyaman deepyaman force-pushed the fix/datasets/share-connection-mixin branch from 05da4d6 to 449a017 Compare November 28, 2024 17:20
@deepyaman deepyaman enabled auto-merge (squash) November 28, 2024 17:20
@merelcht merelcht disabled auto-merge December 3, 2024 16:45
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with solution 👍 I do agree it would be good to see if we can achieve the same without inheritance, but since it's now only used between the Ibis datasets I don't see it as too big of an issue.

Should this be added to the release notes?

@deepyaman
Copy link
Member Author

Should this be added to the release notes?

Yes, will do!

@deepyaman deepyaman enabled auto-merge (squash) December 4, 2024 00:19
@deepyaman deepyaman merged commit 1d768ad into main Dec 4, 2024
12 of 13 checks passed
@deepyaman deepyaman deleted the fix/datasets/share-connection-mixin branch December 4, 2024 02:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kedro-datasets: ibis.FileDataset w/ ibis.TableDataset in pipeline
4 participants