AIP-104: Dynamic Task Iteration and Dynamic Task Partitioning by dabla · Pull Request #62922 · apache/airflow

dabla · 2026-03-05T09:28:26Z

Was generative AI tooling used to co-author this PR?

[ x ] Yes (please specify the tool below)

Github Copilot with Claude Opus 4.6 for some parts like setting up tests or improving documentation.

Description

This PR is the initial implementation of Dynamic Task Iteration (DTI), as discussed in the devlist and building upon the foundations of AIP-104.

For further context on the use cases and performance benefits of DTI, see this Medium Article.

The XCom Database Constraint Challenge
While porting our internal "monkey-patched" version of DTI (used since Airflow 2.x) to the core, I've identified a significant technical hurdle regarding XCom handling.

The Issue

Around Airflow 2.10/2.11, a change was introduced to the database constraints for the XCom table. Specifically:

Current State: The DB prevents creating indexed XComs (map_index >= 0) unless a corresponding mapped TaskInstance exists in the task_instance table.
The Conflict: DTI is designed to process multiple indexed XComs within a single Task Instance. Because there is no 1-to-1 mapping of map_index to a physical TI row, the DB constraint blocks the insertion of these results.

Current Workaround in this PR

To maintain functionality without immediate schema changes, I have implemented a custom XComIterable. This appends the index directly to the XCom key to bypass the constraint and manages the iteration logic internally.

I believe the cleanest path forward is to adjust the DB constraint to allow indexed XComs even in the absence of an indexed TI. This would:

Simplify the DTI implementation (e.g. no more need for XComIterable) which would mean the already existing LazyXComIterator would be automatically used.
Align the DB schema with the more flexible task patterns introduced in Airflow 3.x.

What this PR doesn't implement yet

The partitioning feature, meaning combining the Dynamic Task Mapping with Dynamic Task Iteration in one fluent API.
Also it doesn't take into account pools yet, at the moment the concurrency is controlled via the max_active_tis_per_dag parameter which if not defined default to os.cpu_count().

Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
When adding dependency, check compliance with the ASF 3rd Party License Policy.
For significant user-facing changes create newsfragment: {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

…g classes

kaxil

Thanks for working on this — excited to see DTI taking shape for Airflow 3.2. I've gone through the full diff and have feedback on the implementation, some are bugs that would crash at runtime, others are design choices worth iterating on.

A few high-level things:

No tests. ~700 lines of new production code with zero test coverage. We need tests for IterableOperator, TaskExecutor, MappedTaskInstance, HybridExecutor, XComIterable, DecoratedDeferredAsyncOperator, and the iterate/iterate_kwargs methods — covering success, failure, retry, deferral, and edge cases.
Worker resilience. Since DTI runs N sub-tasks inside a single worker process, we need to think through what happens when that worker dies mid-execution — the scheduler has no record of which sub-tasks completed. Worth documenting the expected behavior and trade-offs here (and whether we want to add checkpointing later).
Thread safety. Several shared mutable structures (context dict, os.environ) are accessed concurrently from multiple threads without synchronization. This needs to be addressed before merge.

Inline comments below with specifics.

dabla · 2026-03-05T15:02:32Z

Thanks for working on this — DTI is an interesting concept and I can see the use case. I've gone through the full diff and have a number of concerns, some are bugs that would crash at runtime, others are architectural questions worth discussing before this goes further.

A few high-level things:

No tests. ~700 lines of new production code with zero test coverage. We need tests for IterableOperator, TaskExecutor, MappedTaskInstance, HybridExecutor, XComIterable, DecoratedDeferredAsyncOperator, and the iterate/iterate_kwargs methods — covering success, failure, retry, deferral, and edge cases.

Thanks for pointing this out. As mentioned earlier on Slack, this PR is currently intended as an initial draft to demonstrate the concept and gather early architectural feedback.

I agree that proper test coverage is essential before this can move forward. The plan is to add unit tests covering the components you mentioned (IterableOperator, TaskExecutor, MappedTaskInstance, HybridExecutor, XComIterable, DecoratedDeferredAsyncOperator, and the iterate/iterate_kwargs APIs), including scenarios for success, retries, failures, deferral, and edge cases.

Once we converge on the architectural direction, I will add the corresponding test suite.

Architectural concern. This builds a mini-executor inside an operator — running N tasks in threads with in-memory XCom, custom retry logic, and sleep()-based retry delays. The scheduler has no visibility into sub-task states, so if the worker dies mid-execution there's no record of which sub-tasks completed. This feels like it needs broader design discussion (probably an AIP) before merging, since it fundamentally changes how task execution works.

I agree this is an important architectural concern and worth discussing further.

The goal of this prototype is to explore a trade-off between observability and scheduling overhead, @ashb and @potiuk mentioned the same remark before. If we try to preserve the same visibility and lifecycle guarantees as Dynamic Task Mapping, we essentially end up re-implementing DTM semantics, which brings back the same scheduler overhead that this approach is trying to avoid.

This proposal intentionally explores a different point in that trade-off space: executing iterations within a single task while allowing controlled parallelism. That does mean the scheduler has indeed less visibility (but also less load) into the internal execution units.

Thread safety. Several shared mutable structures (context dict, os.environ) are accessed concurrently from multiple threads without synchronization.

Good point — thread safety needs to be handled carefully here.

Regarding the task context, my understanding is that operators already receive a per-task context instance, but you're right that when running iterations concurrently we should avoid sharing mutable structures across threads. One possible approach would be to create a shallow or deep copy of the context for each execution unit to ensure isolation.

If you have concerns about specific structures (e.g., os.environ or others), I'm happy to address them and introduce appropriate synchronization or isolation mechanisms where needed.

…g classes

…variables outside multi-threading

…roup

…lt to only skip xcom_push if result is not None

…artially initialized modules

…sion 3.11 in IterableOperator

… on TestIterableOperator

…ds defined

…e done during task creation in IterableOperator

…or so we don't have concurrency issue due to mutability

…der_template_fields on unmapped operator

…en used in combination with Dynamic Task Partitioning

…ex and the index used by IterableOperator

…ecutor

…of deferred task

…map_index

…deferred

… is -1 instead of None by default

…to executing the mapped task instances within IterableOperator

… push and pull calls but append the task index to the key

refactor: Implemented Dynamic Task Iteration

89af76d

dabla requested review from amoghrajesh, ashb and kaxil as code owners March 5, 2026 09:28

boring-cyborg Bot added the area:task-sdk label Mar 5, 2026

dabla marked this pull request as draft March 5, 2026 09:35

Merge branch 'main' into feature/dynamic-task-iteration

634d7f5

dabla force-pushed the feature/dynamic-task-iteration branch 2 times, most recently from 23a6d2b to d8a30b9 Compare March 5, 2026 12:33

refactor: Refactored and reformatted files

edad5de

dabla force-pushed the feature/dynamic-task-iteration branch from d8a30b9 to edad5de Compare March 5, 2026 12:39

refactor: Re-aligned context typing in ExpandInput classes and XComAr…

e74575d

…g classes

kaxil previously requested changes Mar 5, 2026

View reviewed changes

dabla changed the title ~~refactor: Implemented Dynamic Task Iteration~~ Implemented Dynamic Task Iteration Mar 5, 2026

dabla added 15 commits March 5, 2026 20:35

refactor: Re-aligned context typing in ExpandInput classes and XComAr…

e60afaa

…g classes

refactor: MappedArgument should be a pure ResolveMixin

0f1c3b3

refactor: Airflow context vars should be assigned only once into env …

4d30a0c

…variables outside multi-threading

refactor: Collect all raise exceptions and raise them as a ExceptionG…

0997c66

…roup

refactor: Fixed bug in MappedTaskInstance with xcom_pull and xcom_push

c8a2f28

refactor: Fixed bug in condition check for IterableOperator with resu…

e481021

…lt to only skip xcom_push if result is not None

Refactor XComIterable to return fresh iterator per __iter__ call

0c695ea

Merge branch 'main' into feature/dynamic-task-iteration

d321a1d

refactor: Added unit test for iter_values method in ExpandInput types

95015f4

refactor: Lazy import IterableOperator to avoid cyclic imports from p…

a650e91

…artially initialized modules

refactor: Added unit test for iter_values method in XComArg

ea104e4

Merge branch 'main' into feature/dynamic-task-iteration

fd2bfa5

refactor: Add fallback for ExceptionGroup if Python is older than ver…

c68208b

…sion 3.11 in IterableOperator

refactor: Fixed typing with MappedTaskInstance in IterableOperator

35ccdb2

refactor: Fixed import of Logger for executor module

c3ec472

dabla added 14 commits April 22, 2026 20:15

refactor: Replaced pytest.fixture decorators with pytest.mark.db_test…

3890b4d

… on TestIterableOperator

refactor: Fixed some docstrings in PartitionableOperator

2885bbd

refactor: Return casted operator in iterate and iterate_kwargs

b835648

refactor: Only render templated fields if operator has templated_fiel…

7a9a261

…ds defined

refactor: Removed template rendering from TaskExecutor, that should b…

aa5f40d

…e done during task creation in IterableOperator

refactor: Make sure context is copied before passing to _unmap_operat…

3e559db

…or so we don't have concurrency issue due to mutability

refactor: Reformatted IterableOperator

211dd27

refactor: Removed unused jinja2 import

6145338

refactor: Explicitly pass empty set to seen_oids when calling _do_ren…

6959bc4

…der_template_fields on unmapped operator

refactor: Ignore core imports check for partitionedoperator in task-sdk

96d1313

Merge branch 'main' into feature/dynamic-task-iteration

6cea105

refactor: Make sure XComIterable also takes into account map_index wh…

70cd4c2

…en used in combination with Dynamic Task Partitioning

refactor: Re-added task_concurrency as valid argument for BaseOperator

ee9b29f

refactor: Refactored MappedTaskInstance so we can distinguish map_ind…

b855be9

…ex and the index used by IterableOperator

dabla force-pushed the feature/dynamic-task-iteration branch from 1c2e763 to b855be9 Compare April 29, 2026 07:05

dabla and others added 15 commits April 29, 2026 09:09

refactor: Added PartitionedOperator as well in CODEOWNERS

9da9742

Merge branch 'main' into feature/dynamic-task-iteration

512987b

refactor: Fixed test_xcom_key_property_returns_xcom_key in TestTaskEx…

85f5c3e

…ecutor

refactor: Removed newline

19974bf

Merge branch 'main' into feature/dynamic-task-iteration

e760592

Merge branch 'main' into feature/dynamic-task-iteration

99a2ac3

refactor: Make sure to also pass index of MappedTaskInstance in case …

e0a47be

…of deferred task

refactor: Fixed TestTaskExecutor instantiation with index instead of …

b5ebb1f

…map_index

Merge branch 'main' into feature/dynamic-task-iteration

e679c5c

refactor: Pass index before map_index when creating mapped task from …

b9f7bcf

…deferred

refactor: Fixed _make_mapped_ti in TestTaskExecutor so that map_index…

da6ecf6

… is -1 instead of None by default

refactor: Make sure index is used instead of map_index when it comes …

a45ecc7

…to executing the mapped task instances within IterableOperator

refactor: Fixed tests in TaskExecutor

52f06de

refactor: Refactored MappedTaskInstance which now just delegates xcom…

7940aff

… push and pull calls but append the task index to the key

Merge branch 'main' into feature/dynamic-task-iteration

3c9194c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AIP-104: Dynamic Task Iteration and Dynamic Task Partitioning#62922

AIP-104: Dynamic Task Iteration and Dynamic Task Partitioning#62922
dabla wants to merge 231 commits intoapache:mainfrom
dabla:feature/dynamic-task-iteration

dabla commented Mar 5, 2026 •

edited

Loading

Uh oh!

kaxil left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dabla commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dabla commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Was generative AI tooling used to co-author this PR?

Description

The Issue

Current Workaround in this PR

What this PR doesn't implement yet

Uh oh!

kaxil left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dabla commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dabla commented Mar 5, 2026 •

edited

Loading

kaxil left a comment •

edited

Loading