Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T&D Queries should analyze less data where possible by only considering recently emitted rows #28380

Closed
evantahler opened this issue Jul 17, 2023 · 4 comments · Fixed by #31191
Assignees
Labels
1-stream~1-table change the format of the final table and any airbyte required tables team/destinations Destinations team's backlog

Comments

@evantahler
Copy link
Contributor

evantahler commented Jul 17, 2023

Datawarehouses bill based on how much data is analyzed in the query. If we can add more WHERE clauses to limit which rows are considered, that will help.

See https://github.com/airbytehq/typing-and-deduping-sql/pull/23/files for some ideas of what to do. Also, read this slack thread.

Add a test case for this:

  • a sync crashing and having un-typed data from a previous run
@evantahler evantahler added the team/destinations Destinations team's backlog label Jul 17, 2023
@evantahler evantahler changed the title T&D Queries should analize less data where possible T&D Queries should analyze less data where possible Jul 17, 2023
@jbfbell jbfbell added the 1-stream~1-table change the format of the final table and any airbyte required tables label Jul 18, 2023
@edgao
Copy link
Contributor

edgao commented Jul 25, 2023

potentially relevant past issue about legacy normalization partition pruning #14070

@evantahler
Copy link
Contributor Author

^ tldr - save the results of MAX(foo) queries so we don't have to ask it over and over again

@jbfbell jbfbell assigned jbfbell and unassigned jbfbell Sep 18, 2023
@evantahler evantahler changed the title T&D Queries should analyze less data where possible T&D Queries should analyze less data where possible by only considering recently emitted rows Oct 3, 2023
@edgao edgao self-assigned this Oct 3, 2023
@edgao edgao removed their assignment Oct 9, 2023
@edgao
Copy link
Contributor

edgao commented Oct 9, 2023

putting this down for now to focus on async standard inserts. there's a loose pr where I sketched out the interface diff (#31191), but I didn't get time to modify/add any test cases for this.

@evantahler
Copy link
Contributor Author

@edgao I think you started this story perhaps?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1-stream~1-table change the format of the final table and any airbyte required tables team/destinations Destinations team's backlog
Projects
None yet
3 participants