-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dynamic task mapping into TaskSDK runtime #46032
Conversation
69bca00
to
ee19629
Compare
ee19629
to
99f8c68
Compare
This looks good to me. I’d be even more confident if the map_index changes are taken out, but I’ll take Ash’s words they don’t affect anything else if they’re left in. Aside from that it’s just the seemingly redundant |
Yeah, I'll back out the map index change and try/remove the |
99f8c68
to
35fca81
Compare
30c389e
to
a876cb0
Compare
a876cb0
to
313a98a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM.
providers/standard/tests/provider_tests/standard/operators/test_python.py
Show resolved
Hide resolved
b35bb7e
to
ce56376
Compare
The big change here (other than just moving code around) is to introduce a conceptual separation between Definition/Execution time and Scheduler time. This means that the expansion of tasks (creating the TaskInstance rows with different map_index values) is now done on the scheduler, and we now deserialize to different classes. For example, when we deserialize the `DictOfListsExpandInput` it gets turned into an instance of SchedulerDictOfListsExpandInput. This is primarily designed so that DB access is kept 100% out of the TaskSDK. Some of the changes here are on the "wat" side of the scale, and this is mostly designed to not break 100% of our tests, and we have #45549 to look at that more holistically. To support the "reduce" style task which takes as input a sequence of all the pushed (mapped) XCom values, and to keep the previous behaviour of not loading all values in to memory at once, we have added a new HEAD route to the Task Execution interface that returns the number of mapped XCom values so that it is possible to implement `__len__` on the new LazyXComSequence class. I have deleted a tranche of tests from tests/models that were to do with runtime behavoir and and now tested in the TaskSDK instead.
Co-authored-by: Jed Cunningham <[email protected]>
Co-authored-by: Jed Cunningham <[email protected]>
This is a "fake" conflict. |
ce36e6d
to
972df81
Compare
The big change here (other than just moving code around) is to introduce a conceptual separation between Definition/Execution time and Scheduler time. This means that the expansion of tasks (creating the TaskInstance rows with different map_index values) is now done on the scheduler, and we now deserialize to different classes. For example, when we deserialize the `DictOfListsExpandInput` it gets turned into an instance of SchedulerDictOfListsExpandInput. This is primarily designed so that DB access is kept 100% out of the TaskSDK. Some of the changes here are on the "wat" side of the scale, and this is mostly designed to not break 100% of our tests, and we have apache#45549 to look at that more holistically. To support the "reduce" style task which takes as input a sequence of all the pushed (mapped) XCom values, and to keep the previous behaviour of not loading all values in to memory at once, we have added a new HEAD route to the Task Execution interface that returns the number of mapped XCom values so that it is possible to implement `__len__` on the new LazyXComSequence class. I have deleted a tranche of tests from tests/models that were to do with runtime behavoir and and now tested in the TaskSDK instead.
The big change here (other than just moving code around) is to introduce a conceptual separation between Definition/Execution time and Scheduler time. This means that the expansion of tasks (creating the TaskInstance rows with different map_index values) is now done on the scheduler, and we now deserialize to different classes. For example, when we deserialize the `DictOfListsExpandInput` it gets turned into an instance of SchedulerDictOfListsExpandInput. This is primarily designed so that DB access is kept 100% out of the TaskSDK. Some of the changes here are on the "wat" side of the scale, and this is mostly designed to not break 100% of our tests, and we have apache#45549 to look at that more holistically. To support the "reduce" style task which takes as input a sequence of all the pushed (mapped) XCom values, and to keep the previous behaviour of not loading all values in to memory at once, we have added a new HEAD route to the Task Execution interface that returns the number of mapped XCom values so that it is possible to implement `__len__` on the new LazyXComSequence class. I have deleted a tranche of tests from tests/models that were to do with runtime behavoir and and now tested in the TaskSDK instead.
The big change here (other than just moving code around) is to introduce a
conceptual separation between Definition/Execution time and Scheduler time.
This means that the expansion of tasks (creating the TaskInstance rows with
different map_index values) is now done on the scheduler, and we now
deserialize to different classes. For example, when we deserialize the
DictOfListsExpandInput
it gets turned into an instance ofSchedulerDictOfListsExpandInput. This is primarily designed so that DB access
is kept 100% out of the TaskSDK.
Some of the changes here are on the "wat" side of the scale, and this is
mostly designed to not break 100% of our tests, and we have #45549 to look at
that more holistically.
To support the "reduce" style task which takes as input a sequence of all the
pushed (mapped) XCom values, and to keep the previous behaviour of not loading
all values in to memory at once, we have added a new HEAD route to the Task
Execution interface that returns the number of mapped XCom values so that it
is possible to implement
__len__
on the new LazyXComSequence class.I have deleted a tranche of tests from tests/models that were to do with
runtime behavoir and and now tested in the TaskSDK instead.
Testing:
I have been testing this with Celery Executor and the example_dynamic_task_mapping dag, which is this:
It now works, and the sum_it tasks logs show:
Closes #44360
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.