Add deferrable mode to SFTPOperator#65480
Add deferrable mode to SFTPOperator#65480sunildataengineer wants to merge 5 commits intoapache:mainfrom
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide
|
|
Thanks for adding deferrable support to [Blocker 1] Circular import forces inner imports —
Fix: extract [Blocker 2] Missing type annotations on
def __init__(
self,
ssh_conn_id: str | None = None,
local_filepath: str | list[str] | None = None,
remote_filepath: str | list[str] = "",
operation: str = SFTPOperation.PUT,
confirm: bool = True,
create_intermediate_dirs: bool = False,
remote_host: str | None = None, # see Blocker 3
concurrency: int = 1, # see Blocker 3
prefetch: bool = True, # see Blocker 3
) -> None:Same for [Blocker 3]
# _run_transfer — forward remote_host at minimum:
sftp_hook = SFTPHook(ssh_conn_id=self.ssh_conn_id, remote_host=self.remote_host or "")Also add these three params to [Blocker 4] Directory transfers broken in deferrable mode —
Suggested fix: extract the transfer loop from [Blocker 5] Deprecated since Python 3.10+. Use loop = asyncio.get_running_loop()
await loop.run_in_executor(None, self._run_transfer)[Minor 1]
[Minor 2] Redundant imports inside test methods —
[Minor 3] Missing trigger tests — Please add tests for
|
|
Thank you for the detailed review @srchilukoori. The overall approach feedback is very encouraging. I will
Will push the updated code shortly. |
9d35646 to
72006e1
Compare
|
@sunildataengineer Converting to draft — this PR doesn't yet meet our Pull Request quality criteria.
See the linked criteria for how to fix each item, then mark the PR "Ready for review". This is not a rejection — just an invitation to bring the PR up to standard. No rush. Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. |
|
@srchilukoori All review feedback has been addressed:
All 6 tests passing locally. Ready for re-review. |
|
Good progress — all 5 original blockers are resolved. A few remaining issues are likely causing the CI failures, plus two consistency nits. [Blocker] Inner imports still in The circular import is gone now that import os
from pathlib import Path
from airflow.providers.sftp.constants import SFTPOperation
from airflow.providers.sftp.hooks.sftp import SFTPHook[Blocker] Inner imports still in new test methods —
All three [Fix] # current — wrong: overrides the connection's host with "" when remote_host is None
sftp_hook = SFTPHook(ssh_conn_id=self.ssh_conn_id, remote_host=self.remote_host or "")
# fix — pass None so the hook uses the connection's host
sftp_hook = SFTPHook(ssh_conn_id=self.ssh_conn_id, remote_host=self.remote_host)[Nit] Use # current
operation: str = "put",
# preferred — consistent with SFTPOperator
operation: str = SFTPOperation.PUT,Once the inner imports are moved to module level the static checks should pass. Run |
e2f65ee to
146dcd2
Compare
|
@srchilukoori fixed all ruff linting issues |
Adds deferrable=True parameter to SFTPOperator which allows the operator to defer execution to SFTPOperatorTrigger, freeing the worker slot during file transfers instead of blocking it. - Add SFTPOperatorTrigger class in triggers/sftp.py - Add deferrable param to SFTPOperator.__init__() - Add defer() call in execute() when deferrable=True - Add execute_complete() callback method - Add unit tests for deferrable mode Closes apache#65475
Extract SFTPOperation into constants.py to fix circular import - Add full type annotations to SFTPOperatorTrigger - Add remote_host, concurrency, prefetch params to trigger - Fix directory transfer support in deferrable mode - Replace deprecated asyncio.get_event_loop() with get_running_loop() - Add remote_host, concurrency, prefetch to serialize() - Remove inner imports from test methods - Add SFTPOperatorTrigger tests: serialize roundtrip, run success, run error
cc927c6 to
a20bc92
Compare
|
Rebased on latest upstream main. All 6 tests passing locally.
|
What this PR does
Adds
deferrable=Trueparameter toSFTPOperatorwhich allows theoperator to defer execution to
SFTPOperatorTrigger, freeing theworker slot during file transfers instead of blocking it.
Currently,
SFTPOperatorblocks a worker slot for the entire durationof every file transfer. For large file transfers, this wastes worker
resources unnecessarily.
Changes
SFTPOperatorTriggerclass intriggers/sftp.pydeferrableparameter toSFTPOperator.__init__()withdefault
False(fully backward compatible)self.defer()call inexecute()whendeferrable=Trueexecute_complete()callback method to handle trigger resultTests
test_sftp_operator_defers_when_deferrable_truetest_sftp_operator_execute_complete_successtest_sftp_operator_execute_complete_raises_on_errorCloses #65475