Skip to content

Unify executor workload queues#63491

Open
anishgirianish wants to merge 8 commits into
apache:mainfrom
anishgirianish:refactor-workload-queue
Open

Unify executor workload queues#63491
anishgirianish wants to merge 8 commits into
apache:mainfrom
anishgirianish:refactor-workload-queue

Conversation

@anishgirianish
Copy link
Copy Markdown
Contributor

@anishgirianish anishgirianish commented Mar 12, 2026


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Summary

Refactors executor workload queue management for extensibility. No behavioral change , scheduling order, slot accounting, and all provider executors work identically to before.

Follows the direction proposed by @ferruzzi #62343 (comment).

Problem

Adding a new workload type (like ExecuteCallback or TestConnection) required touching ~6 places in BaseExecutor: a new queue dict, a new supports_* flag,slots calculation, an isinstance branch in queue_workload, a dedicated scheduling method, and isinstance branches in dequeue/trigger logic. Each provider executor that overrode queue_workload also needed updating. This made extending the executor interface unnecessarily painful.

What this does

Replaces the per-type queue dicts and boolean capability flags with three simple primitives:

  • executor_queues: a single defaultdict(dict) keyed by workload type string (e.g. "ExecuteTask","ExecuteCallback") instead of separatequeued_tasks / queued_callbacks dicts
  • supported_workload_types: a frozenset of type strings instead of individual supports_callbacks booleans
  • WORKLOAD_TYPE_PRIORITY + sort_key / queue_key : properties on each workload schema that control scheduling priority and queue indexing

The base class queue_workload is now generic: validate the type, store by key. Four provider executors (K8s, ECS, Batch, Lambda) no longer need their own queue_workload overrides. trigger_tasks becomes trigger_workloads since it handles all workload types now.

Adding a new workload type after this refactor

  1. Define queue_key and sort_key on the workload schema
  2. Add the type string to supported_workload_types on supporting executors
  3. Handle the type in _process_workloads, done

No changes needed in BaseExecutor itself.


  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@boring-cyborg boring-cyborg Bot added area:Executors-core LocalExecutor & SequentialExecutor area:providers provider:amazon AWS/Amazon - related issues provider:celery provider:cncf-kubernetes Kubernetes (k8s) provider related issues provider:edge Edge Executor / Worker (AIP-69) / edge3 labels Mar 12, 2026
Comment thread airflow-core/src/airflow/executors/base_executor.py Outdated
Comment thread airflow-core/src/airflow/executors/base_executor.py
Copy link
Copy Markdown
Contributor

@ferruzzi ferruzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a real quick pass and left some comments and questions, I'll try to get a more thorough one tomorrow.

Comment thread airflow-core/src/airflow/executors/workloads/base.py Outdated
Comment thread airflow-core/src/airflow/executors/base_executor.py Outdated
Comment thread airflow-core/src/airflow/executors/base_executor.py
Comment thread airflow-core/src/airflow/executors/base_executor.py Outdated
Comment thread airflow-core/tests/unit/executors/test_base_executor.py
Comment thread airflow-core/src/airflow/executors/base_executor.py
@anishgirianish anishgirianish force-pushed the refactor-workload-queue branch 3 times, most recently from aee94fb to 8997ee4 Compare March 13, 2026 06:02
@anishgirianish anishgirianish marked this pull request as draft March 13, 2026 07:43
@anishgirianish anishgirianish force-pushed the refactor-workload-queue branch 6 times, most recently from 11ee7ef to 249b014 Compare March 14, 2026 04:40
@anishgirianish anishgirianish marked this pull request as ready for review March 14, 2026 05:45
@anishgirianish anishgirianish force-pushed the refactor-workload-queue branch 3 times, most recently from 0f6f172 to 35a927f Compare April 18, 2026 02:03
@anishgirianish anishgirianish marked this pull request as ready for review April 18, 2026 07:09
@potiuk potiuk added the ready for maintainer review Set after triaging when all criteria pass. label Apr 22, 2026
@eladkal eladkal requested review from kaxil and o-nikolas April 26, 2026 19:55
Comment thread airflow-core/src/airflow/executors/workloads/base.py Outdated
Comment thread airflow-core/src/airflow/executors/base_executor.py Outdated
Comment thread airflow-core/src/airflow/executors/base_executor.py Outdated
Comment thread airflow-core/src/airflow/executors/base_executor.py
Comment thread airflow-core/src/airflow/executors/base_executor.py
Comment thread airflow-core/src/airflow/executors/base_executor.py Outdated
@anishgirianish anishgirianish marked this pull request as draft April 28, 2026 07:20
@anishgirianish anishgirianish force-pushed the refactor-workload-queue branch from 1a0f949 to b6b161f Compare May 4, 2026 00:24
@anishgirianish anishgirianish marked this pull request as ready for review May 4, 2026 01:59
@anishgirianish anishgirianish requested a review from kaxil May 4, 2026 07:56
Copy link
Copy Markdown
Contributor

@ferruzzi ferruzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like my concerns were all addressed. LGTM

@ashb
Copy link
Copy Markdown
Member

ashb commented May 14, 2026

I'm taking a look at this now.

Copy link
Copy Markdown
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this!

Please add a newsfragment for the deprecated public API.

I'm also worried that an un-updated executor (either one of "our" ones where the user hasn't updated yet, or a custom one) would spam-the-living-day-light out of the logs by accessing the the deprecated queued_tasks property on every heartbeat. That needs addressing to only log once per class or once per instance I think.

Comment thread airflow-core/src/airflow/executors/base_executor.py
def queued_tasks(self) -> dict[TaskInstanceKey, Any]:
"""Return queued tasks from celery and kubernetes executor."""
return self.celery_executor.queued_tasks | self.kubernetes_executor.queued_tasks # type: ignore[return-value]
queued_tasks = self.celery_executor.queued_tasks.copy()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CeleryKubernetesExecutor.queued_tasks calls the deprecated BaseExecutor.queued_tasks property on both child executors, emitting RemovedInAirflow4Warning on every access. Since this file is already being updated in this PR, please migrate these call sites to use the new API (guarded with AIRFLOW_V_3_3_PLUS for back-compat with Airflow <3.3).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This executor raises RuntimeError on Airflow 3.0+ (line 80), so this code path is unreachable on any version where executor_queues exists. We shouldn't be changing these files any more than is strictly needed to keep CI happy

self.team_name: str | None = team_name
self.queued_tasks: dict[TaskInstanceKey, workloads.ExecuteTask] = {}
self.queued_callbacks: dict[str, workloads.ExecuteCallback] = {}
self.executor_queues: dict[str, dict[WorkloadKey, QueueableWorkload]] = defaultdict(dict)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a flat dict[WorkloadKey, QueueableWorkload] instead of a dict-of-dicts?

_get_workloads_to_schedule immediately flattens all sub-dicts into a single list and then sorts by (WORKLOAD_TYPE_PRIORITY, sort_key) — so priority ordering (callbacks before tasks, higher-weight tasks first) is entirely in the sort step, not in the dict structure. A flat dict would produce identical scheduling behaviour.

The only load-bearing use of the sub-dict grouping is the deprecated queued_tasks/queued_callbacks compat properties — which are on their way out. Every type-keyed deletion in providers (del self.executor_queues[WorkloadType.EXECUTE_TASK][key]) could be a plain del flat_dict[key] since WorkloadKey is unique across types.

A flat dict would simplify CeleryKubernetesExecutor.queued_tasks (no sub-dict merging), make provider-side deletion uniform, and remove the defaultdict(dict) nesting.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the gate on line 80 of CeleryKubernetesExecutor throws a RuntimeError if Airflow version is >= 3.0, so I don't think "simplifying CeleryKubernetesExecutor" needs to factor into this decision.

That said, a flat dict isn't a bad idea in principle and likely would have been a cleaner choice in the first place. My main concern is mostly practical: Lambda (#63035), Celery (#63888), and Batch (#62984) have already merged using the executor_queues[WorkloadType.X][key] pattern, and ECS (#63657), K8s (#63454), and Edge (#63498) are all in progress implementing the same pattern. Flattening now would mean reworking all six executor implementations in addition to the changes that it would require in this PR. I'm not sure we really gain anything for that work?

How would you feel about adding a TODO to flatten it when the compat properties are removed (in 4.0??) At that point the nested structure loses its main justification anyway. Does that seem reasonable, or is that just punting the same work to Future Us?

Comment thread airflow-core/src/airflow/executors/workloads/base.py Outdated
@anishgirianish anishgirianish force-pushed the refactor-workload-queue branch from 228e71b to 36bb7b9 Compare May 15, 2026 16:59
@anishgirianish
Copy link
Copy Markdown
Contributor Author

anishgirianish commented May 15, 2026

Thank you so much @ashb and @ferruzzi for the review! Latest push:

  • warn-once-per-class for all heartbeat-frequency compat shims (queued_tasks, queued_callbacks, supports_callbacks, trigger_tasks, order_queued_tasks_by_priority) so logs don't flood
  • init_subclass shim for legacy supports_callbacks = True — synthesizes supported_workload_types + warns
  • newsfragment added
  • WORKLOAD_TYPE_PRIORITY typing fixed
  • TODO at executor_queues for the 4.0 flat-dict migration

Two threads I'd love your steer on:

  1. Flat dict : went with the TODO-to-4.0 compromise. Happy to flatten now if you'd rather.
  2. CeleryKubernetesExecutor.queued_tasks : left as-is since init raises on Airflow ≥3.0 making the new branch unreachable. Happy to migrate if you'd still prefer it.

Would like to request your re-review. Will follow whichever way you'd both recommend. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Executors-core LocalExecutor & SequentialExecutor area:providers provider:amazon AWS/Amazon - related issues provider:celery provider:cncf-kubernetes Kubernetes (k8s) provider related issues provider:edge Edge Executor / Worker (AIP-69) / edge3 ready for maintainer review Set after triaging when all criteria pass.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants