Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destinations snowflake+bigquery: Improve performance by filtering raw table on extracted_at #31191

Merged
merged 25 commits into from
Oct 17, 2023

Conversation

edgao
Copy link
Contributor

@edgao edgao commented Oct 9, 2023

closes #28380; based on airbytehq/typing-and-deduping-sql#23

Adds a _airbyte_extracted_at > ? filter to two queries in T+D:

  • Inserting new raw records to the final table
  • Setting _airbyte_loaded_at = current_timestamp

And adds three tests to exercise this behavior.

This is a somewhat nontrivial change, so I'll roll it out to the internal workspace for bigquery first. If that works then I'll do a full release for both bigquery and snowflake (since the logic is basically identical for both destinations).

@vercel
Copy link

vercel bot commented Oct 9, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 17, 2023 5:31pm

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2023

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@airbyte-oss-build-runner

This comment was marked as outdated.

@airbyte-oss-build-runner

This comment was marked as outdated.

Copy link
Contributor

@evantahler evantahler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I focused just on the SQL. Some nits, and a perf question, but nothing blocking.

@airbyte-oss-build-runner

This comment was marked as outdated.

@airbyte-oss-build-runner

This comment was marked as outdated.

@airbyte-oss-build-runner

This comment was marked as outdated.

@airbyte-oss-build-runner

This comment was marked as outdated.

@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Oct 17, 2023
@edgao edgao enabled auto-merge (squash) October 17, 2023 15:53
@airbyte-oss-build-runner

This comment was marked as outdated.

@airbyte-oss-build-runner

This comment was marked as outdated.

@airbyte-oss-build-runner
Copy link
Collaborator

destination-bigquery test report (commit 77db8103bf) - ✅

⏲️ Total pipeline duration: 03mn13s

Step Result
Build connector tar
Java Connector Unit Tests
Build destination-bigquery docker image for platform(s) linux/x86_64
Java Connector Integration Tests
Validate metadata for destination-bigquery
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-bigquery test

@airbyte-oss-build-runner
Copy link
Collaborator

destination-snowflake test report (commit 77db8103bf) - ✅

⏲️ Total pipeline duration: 02mn45s

Step Result
Build connector tar
Java Connector Unit Tests
Build destination-snowflake docker image for platform(s) linux/x86_64
Java Connector Integration Tests
Validate metadata for destination-snowflake
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-snowflake test

@edgao edgao merged commit f1baf2a into master Oct 17, 2023
@edgao edgao deleted the edgao/filter_td_on_extracted_at branch October 17, 2023 17:48
ariesgun pushed a commit to ariesgun/airbyte that referenced this pull request Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

T&D Queries should analyze less data where possible by only considering recently emitted rows
5 participants