Skip to content

Add ExecuteCallback support to Edge Executor#63498

Draft
wjddn279 wants to merge 13 commits intoapache:mainfrom
wjddn279:edge-worker-callback-workload
Draft

Add ExecuteCallback support to Edge Executor#63498
wjddn279 wants to merge 13 commits intoapache:mainfrom
wjddn279:edge-worker-callback-workload

Conversation

@wjddn279
Copy link
Copy Markdown
Contributor

@wjddn279 wjddn279 commented Mar 13, 2026

related: #62887

NOTE

Make draft pr to discuss direction of implementation with code owners


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@boring-cyborg boring-cyborg Bot added area:providers provider:edge Edge Executor / Worker (AIP-69) / edge3 labels Mar 13, 2026
@wjddn279
Copy link
Copy Markdown
Contributor Author

wjddn279 commented Mar 13, 2026

@jscheffl @dheerajturaga @ferruzzi

Hello! Following the multi-team work, I've been assigned to add functionality to the edge executor. As before, I've opened a draft PR first to discuss the direction before starting development.

This task involves changing the Deadline Alerts callback work so that it can be executed on the worker. The edge_executor is currently structured to directly update the task list in the edge_job table within the queue_workload method. It needs to be updated to handle not only the existing workloads.ExecuteTask but also the newly added workloads.ExecuteCallback — and I'd like us to align on how to store this class in the DB. I see three possible approaches:

  • Fit it into the existing schema without changing the edge_job schema.
    This should work functionally, but it would require populating existing columns with meaningless values (dag_id, task_id, run_id).

  • Modify the edge_job schema.
    By adding a workload_type field, we can implicitly distinguish the expected schema per value. This could also be an opportunity to make changes to the PK and nullable constraints.

  • Add a separate edge_callback_job table.
    Fully separating the tables would guarantee clear and precise consistency.

All three approaches could work without any functional issues. I'd love to hear your thoughts.

@jscheffl
Copy link
Copy Markdown
Contributor

@jscheffl @dheerajturaga @ferruzzi

Hello! Following the multi-team work, I've been assigned to add functionality to the edge executor. As before, I've opened a draft PR first to discuss the direction before starting development.

This task involves changing the Deadline Alerts callback work so that it can be executed on the worker. The edge_executor is currently structured to directly update the task list in the edge_job table within the queue_workload method. It needs to be updated to handle not only the existing workloads.ExecuteTask but also the newly added workloads.ExecuteCallback — and I'd like us to align on how to store this class in the DB. I see three possible approaches:

* Fit it into the existing schema without changing the edge_job schema.
  This should work functionally, but it would require populating existing columns with meaningless values (dag_id, task_id, run_id).

* Modify the edge_job schema.
  By adding a workload_type field, we can implicitly distinguish the expected schema per value. This could also be an opportunity to make changes to the PK and nullable constraints.

* Add a separate edge_callback_job table.
  Fully separating the tables would guarantee clear and precise consistency.

All three approaches could work without any functional issues. I'd love to hear your thoughts.

Thanks for raising a short check. Actually have not thought about this and have no strong opinion developed.

An callback also is belonging to a Dag, Task, correct? So these details still could be used to populate details?

Note that the table structure and PK is originating from the time of Airflow 2 before a Task had a UUID. Therefore Dag_id, Task_id and Run_id were primary keys as it was the job PK. I matter of change I'd be OK to change the DB layout and switch to Task instance UUID or some other artificial UUID would also be OK and keeping the existing fields informational. In this case if these are not PK anymore you could fill will empty or a placeholder if not relevant/existing. But this would also impact REST API services, maybe some adjustment there are also needed (e.g. /jobs/state/...) as they carry the existing UUID.

Note that Core and worker might a bit diverging so if REST API is changed at least the existing endpoints need to be kept for compatibility.

If technically not tooo much overhead I'd slightly prefer a single table for all, then only 1 query is needed to poll for jobs.

Other opinions?

@wjddn279
Copy link
Copy Markdown
Contributor Author

@jscheffl

Yes, I have no objection to keeping it as a single table. I generally agree with the opinion you've provided, and I'll implement the details and discuss further from there. Thank you!

@jscheffl
Copy link
Copy Markdown
Contributor

@jscheffl

Yes, I have no objection to keeping it as a single table. I generally agree with the opinion you've provided, and I'll implement the details and discuss further from there. Thank you!

Cool!

@dheerajturaga + @ferruzzi still other opinions welcome :-D

@ferruzzi ferruzzi mentioned this pull request Mar 16, 2026
2 tasks
@ferruzzi
Copy link
Copy Markdown
Contributor

ferruzzi commented Mar 16, 2026

Edge is more or less your baby, so I'm going to defer to whatever you think here.

An callback also is belonging to a Dag, Task, correct? So these details still could be used to populate details?

As of right now, all deadline callbacks are tied to a dagrun. They are created and cleaned up as part of the dagrun lifecycle, so that means they obviously also have access to dag, dagbundle, serialized_dag, etc but not to Task. That may (and likely will?) change in the future, but that's what's active right now.

One thing to note is that there is a lot of work going on to unify TaskInstance and ExedcutorCallback into a shared interface we're calling Workload. So pretty much anywhere that had TaskInstanceKey or something like that would ideally be migrated to use workload.key, for example. You can see (and review ;) ) what I think may be the last batch of those changes in #63491 and an example of one of the other executors getting this treatment over in #62984 if that helps you think through what @wjddn279 is in store for.

Edit: Jens - I know you are aware of the other one and have reviewed it, that was more just a general comment for anyone who comes across this and wants context.

@wjddn279 wjddn279 force-pushed the edge-worker-callback-workload branch 2 times, most recently from 2419908 to ff72b46 Compare March 27, 2026 07:01
Comment thread providers/edge3/src/airflow/providers/edge3/cli/worker.py Outdated
@wjddn279 wjddn279 force-pushed the edge-worker-callback-workload branch 3 times, most recently from 1ac699c to 7e1bc89 Compare March 30, 2026 01:03
@wjddn279 wjddn279 force-pushed the edge-worker-callback-workload branch from 7e1bc89 to 9b3c5a4 Compare April 21, 2026 02:31
@wjddn279 wjddn279 force-pushed the edge-worker-callback-workload branch from 9b3c5a4 to 9d1b11d Compare April 26, 2026 05:19
@wjddn279 wjddn279 marked this pull request as ready for review April 27, 2026 00:07
@wjddn279 wjddn279 force-pushed the edge-worker-callback-workload branch from 089414b to ccda321 Compare April 27, 2026 00:07
Comment thread providers/edge3/src/airflow/providers/edge3/cli/worker.py Outdated
Comment thread providers/edge3/src/airflow/providers/edge3/executors/edge_executor.py Outdated
Comment thread providers/edge3/src/airflow/providers/edge3/models/types.py
@wjddn279 wjddn279 force-pushed the edge-worker-callback-workload branch from ccda321 to 2fec49c Compare April 27, 2026 06:29

command: Annotated[
ExecuteTask,
ExecuteTypeBody,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, maybe coming a bit late during second pass: So far the API was always backwards compatible, means if you have a Edge worker with a previous version running it would happily continue to run and softly "drain" if there is a provider or Airflow core version mis-match.

Now here the return type for job fetching changes, means if the worker would attempt to fetch a job with an old version, the job would be assigned but the worker (probably?) fails in de-serializing the content and fails. Probably before draining.

Have you tested this and how does it behave?
(Compared to Task SDK there is currently no Cadwyn layer for versioning, this is in the backlog...)

If this is a problem can we make it somehow compatible or make a parallel endpoint such that it is not failing on old clients until they drain and jobs are not pulled and never executed?

Copy link
Copy Markdown
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry in the second pass review came along one problem.

Additionally can I kindly request that you add one example /test callback into the integration_test.py Dag in providers/edge3/src/airflow/providers/edge3/example_dags/integration_test.py which serves as a (currently manual) system test?

@wjddn279
Copy link
Copy Markdown
Contributor Author

@jscheffl

Yes — I found a few issues while testing locally. I'll apply the fixes and request another review.

@wjddn279 wjddn279 marked this pull request as draft April 28, 2026 13:11
@wjddn279
Copy link
Copy Markdown
Contributor Author

pending until #66141

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:edge Edge Executor / Worker (AIP-69) / edge3

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants