GenericTransfer: switch paginated mode to non-deferred default with opt-in deferral#64321
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an explicit deferrable/non-deferrable switch to GenericTransfer’s paginated mode so pagination can run eagerly (non-deferred) for backends where deferral is problematic.
Changes:
- Introduces a
deferrableparameter toGenericTransferand implements an eager pagination loop whendeferrable=False. - Updates unit tests to cover both eager (non-deferred) and deferred paginated execution.
- Updates the public
.pyistub to expose the newdeferrableparameter.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
providers/common/sql/src/airflow/providers/common/sql/operators/generic_transfer.py |
Adds deferrable param and non-deferred pagination execution path. |
providers/common/sql/src/airflow/providers/common/sql/operators/generic_transfer.pyi |
Updates public typing stub to include deferrable. |
providers/common/sql/tests/unit/common/sql/operators/test_generic_transfer.py |
Adds/adjusts tests for non-deferred and deferred paginated reads. |
potiuk
left a comment
There was a problem hiding this comment.
Functionally clean and CI is green. One blocker before this can land:
1. Document the default behaviour change in the provider changelog. Before this PR, paginated mode (page_size + str SQL) always deferred; after, it runs synchronously by default unless [operators] default_deferrable = true is set. That's a silent behaviour change for any existing DAG using paginated GenericTransfer. The provider's changelog.rst carves out exactly this case at the top: "only add notes to the Changelog… when there are some breaking changes and you want to add an explanation to the users on how they are supposed to deal with them." Please add an entry under the next-version heading in providers/common/sql/docs/changelog.rst covering the switch and the two ways to restore the old behaviour (pass deferrable=True per task, or set the global [operators] default_deferrable=true).
Two non-blocking suggestions:
2. The PR title reads as additive ("Add support for non-deferred pagination") but the PR also flips the default. Something like GenericTransfer: switch paginated mode to non-deferred default with opt-in deferral would surface that in the release notes.
3. Both test_non_deferred_paginated_read and test_deferred_paginated_read pass deferrable explicitly. A small test that omits the kwarg under default config and asserts the non-deferred branch is hit would lock in the contract.
The .pyi and "always-deferred behaviour" notes Copilot already raised look correctly addressed — not re-litigating.
Happy to take another look once the changelog entry is in.
This review was drafted by an AI-assisted tool and may contain mistakes. An Apache Airflow maintainer (@potiuk) has read and confirmed it before submission. See contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions for what counts as a maintainer review.
…c-transfer # Conflicts: # providers/common/sql/docs/changelog.rst
|
@potiuk think all remarks have been addressed. |
potiuk
left a comment
There was a problem hiding this comment.
Thanks for the changes — title, changelog, and default-config test all addressed. Approving. Three small nits inline as optional follow-ups; the version-heading comment from earlier still stands and should be dropped before merge (release manager assigns the version).
Drafted-by: Claude Opus 4.7; reviewed by @potiuk before posting
Add support for non-deferred pagination mode in GenericTransfer, as until now only deferred was supported which can be problematic with some backends.
Was generative AI tooling used to co-author this PR?
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.